RE: CopyField from text to multi value
Thanks Walter! -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Monday, October 20, 2014 12:09 AM To: solr-user@lucene.apache.org Subject: Re: CopyField from text to multi value I think that info is available with termvectors. That should give a list of the query terms that matched each document, if I understand it correctly. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 19, 2014, at 7:37 AM, Tomer Levi tomer.l...@nice.com wrote: Thanks again for the help. The use case is this. In my UI I would like to indicate which words leaded to every document in the response. It actually seems like a simple highlight case but instead of getting the highlight result as this is a brlong/br string brwith/br text, Our UI team wants a list of words, i.e:[long, with]. So, I assumed that I can just tokenize the original text - copy the tokens into new multi-value fields - ask Solr to highlight the multi-value field That is my use case. Thanks again Tomer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, October 19, 2014 5:18 PM To: solr-user@lucene.apache.org Subject: Re: CopyField from text to multi value This really feels like an XY problem, which I think Jack is alluding to. bq: I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field This statement is really confusing. You can't have the output of the analysis chain used as input to a copyField, it just doesn't work that way which is what you seem to want to do with the second sentence. Then you bring shingles into the picture... So let's take Jack's suggestion and back up and tell us what the use-case you're trying to support is rather than leaving us to guess what problem you're trying to solve.. Best, Erick On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.commailto:j...@basetechnology.com wrote: As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let's say my field text contains: I am a solr user I would like to have a multi-value copyFields with the following content: [I, am, a, solr, user] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.commailto:tomer.l...@nice.com www.nice.comhttp://www.nice.com
Re: CopyField from text to multi value
Not quite sure what you're asking here. If you do a copyField, the raw input is, well, copied to the destination field and _then_ the analysis chain is applied. Which seems to be what you want, the destination field would be a text-based field, perhaps text_general or some such from the distro. And perhaps there;s some confusion about what multiValued means here. It does _not_ mean tokenized, i.e. broken up into words. non-multiValued fields can be tokenized. multiValued means tha tmore than one entry for the field can be in a doc. I.e. (using the XML form of an input doc as an example) add doc field name=multisome text/field field name=multiand now for something completely different/field /doc /add will succeed with a field defined as multiValued=true, but fail with something with multiValued=false. In either case, though, whether the input was broken up into multiple, independently-searchable tokens (words) is orthogonal to whether it's multiValued or not, and is entirely dependent on the analysis chain in the fieldType for the field in question. Best, Erick On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi tomer.l...@nice.com wrote: Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: *“I am a solr user”* I would like to have a multi-value copyFields with the following content*: [“I”, “am”, “a”, “solr”, “user”]* *Thanks,* *Tomer Levi* *Software Engineer * *Big Data Group* *Product Technology Unit* (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png] http://twitter.com/NICE_Systems/[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png] http://www.facebook.com/pages/NICE-Systems/149072782602/[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png] http://www.linkedin.com/company/nice-systems[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png] http://www.nice.com/blog [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg] http://www.nice.com/big-data-solutions
RE: CopyField from text to multi value
Hi Erick, Thanks for the explanation, I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field, and I think that ShingleFilterFactory might do that, isn’t it? Tomer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, October 19, 2014 4:31 PM To: solr-user@lucene.apache.org Subject: Re: CopyField from text to multi value Not quite sure what you're asking here. If you do a copyField, the raw input is, well, copied to the destination field and _then_ the analysis chain is applied. Which seems to be what you want, the destination field would be a text-based field, perhaps text_general or some such from the distro. And perhaps there;s some confusion about what multiValued means here. It does _not_ mean tokenized, i.e. broken up into words. non-multiValued fields can be tokenized. multiValued means tha tmore than one entry for the field can be in a doc. I.e. (using the XML form of an input doc as an example) add doc field name=multisome text/field field name=multiand now for something completely different/field /doc /add will succeed with a field defined as multiValued=true, but fail with something with multiValued=false. In either case, though, whether the input was broken up into multiple, independently-searchable tokens (words) is orthogonal to whether it's multiValued or not, and is entirely dependent on the analysis chain in the fieldType for the field in question. Best, Erick On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi tomer.l...@nice.com wrote: Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: *“I am a solr user”* I would like to have a multi-value copyFields with the following content*: [“I”, “am”, “a”, “solr”, “user”]* *Thanks,* *Tomer Levi* *Software Engineer * *Big Data Group* *Product Technology Unit* (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png] http://twitter.com/NICE_Systems/[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png] http://www.facebook.com/pages/NICE-Systems/149072782602/[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png] http://www.linkedin.com/company/nice-systems[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png] http://www.nice.com/blog [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg] http://www.nice.com/big-data-solutions
Re: CopyField from text to multi value
As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: “I am a solr user” I would like to have a multi-value copyFields with the following content: [“I”, “am”, “a”, “solr”, “user”] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
Re: CopyField from text to multi value
This really feels like an XY problem, which I think Jack is alluding to. bq: I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field This statement is really confusing. You can't have the output of the analysis chain used as input to a copyField, it just doesn't work that way which is what you seem to want to do with the second sentence. Then you bring shingles into the picture... So let's take Jack's suggestion and back up and tell us what the use-case you're trying to support is rather than leaving us to guess what problem you're trying to solve.. Best, Erick On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.com wrote: As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: “I am a solr user” I would like to have a multi-value copyFields with the following content: [“I”, “am”, “a”, “solr”, “user”] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
RE: CopyField from text to multi value
Thanks again for the help. The use case is this. In my UI I would like to indicate which words leaded to every document in the response. It actually seems like a simple highlight case but instead of getting the highlight result as this is a brlong/br string brwith/br text, Our UI team wants a list of words, i.e:[long, with]. So, I assumed that I can just tokenize the original text - copy the tokens into new multi-value fields - ask Solr to highlight the multi-value field That is my use case. Thanks again Tomer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, October 19, 2014 5:18 PM To: solr-user@lucene.apache.org Subject: Re: CopyField from text to multi value This really feels like an XY problem, which I think Jack is alluding to. bq: I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field This statement is really confusing. You can't have the output of the analysis chain used as input to a copyField, it just doesn't work that way which is what you seem to want to do with the second sentence. Then you bring shingles into the picture... So let's take Jack's suggestion and back up and tell us what the use-case you're trying to support is rather than leaving us to guess what problem you're trying to solve.. Best, Erick On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.commailto:j...@basetechnology.com wrote: As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: “I am a solr user” I would like to have a multi-value copyFields with the following content: [“I”, “am”, “a”, “solr”, “user”] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.commailto:tomer.l...@nice.com www.nice.comhttp://www.nice.com
Re: CopyField from text to multi value
I think that info is available with termvectors. That should give a list of the query terms that matched each document, if I understand it correctly. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 19, 2014, at 7:37 AM, Tomer Levi tomer.l...@nice.com wrote: Thanks again for the help. The use case is this. In my UI I would like to indicate which words leaded to every document in the response. It actually seems like a simple highlight case but instead of getting the highlight result as this is a brlong/br string brwith/br text, Our UI team wants a list of words, i.e:[long, with]. So, I assumed that I can just tokenize the original text - copy the tokens into new multi-value fields - ask Solr to highlight the multi-value field That is my use case. Thanks again Tomer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, October 19, 2014 5:18 PM To: solr-user@lucene.apache.org Subject: Re: CopyField from text to multi value This really feels like an XY problem, which I think Jack is alluding to. bq: I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field This statement is really confusing. You can't have the output of the analysis chain used as input to a copyField, it just doesn't work that way which is what you seem to want to do with the second sentence. Then you bring shingles into the picture... So let's take Jack's suggestion and back up and tell us what the use-case you're trying to support is rather than leaving us to guess what problem you're trying to solve.. Best, Erick On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.commailto:j...@basetechnology.com wrote: As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: “I am a solr user” I would like to have a multi-value copyFields with the following content: [“I”, “am”, “a”, “solr”, “user”] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.commailto:tomer.l...@nice.com www.nice.comhttp://www.nice.com