RE: CopyField from text to multi value

2014-10-20 Thread Tomer Levi
Thanks Walter!

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Monday, October 20, 2014 12:09 AM
To: solr-user@lucene.apache.org
Subject: Re: CopyField from text to multi value

I think that info is available with termvectors. That should give a list of the 
query terms that matched each document, if I understand it correctly.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Oct 19, 2014, at 7:37 AM, Tomer Levi tomer.l...@nice.com wrote:

 Thanks again for the help.
 
 
 
 The use case is this.
 
 In my UI I would like to indicate which words leaded to every document in the 
 response.
 
 It actually seems like a simple highlight case but instead of getting the 
 highlight result as this is a brlong/br string brwith/br text,
 
 Our UI team wants a list of words, i.e:[long, with].
 
 
 
 So, I assumed that I can just tokenize the original text - copy the tokens 
 into new multi-value fields - ask Solr to highlight the multi-value field
 
 
 
 That is my use case.
 
 Thanks again
 
 Tomer
 
 
 
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Sunday, October 19, 2014 5:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: CopyField from text to multi value
 
 
 
 This really feels like an  XY problem, which I think Jack is alluding to.
 
 
 
 bq:  I understand that the analysis chain is applied after the raw input was 
 copied.
 
 I need to store the output of the analysis chain as a new multi-value field
 
 
 
 This statement is really confusing. You can't have the output of the analysis 
 chain used as input to a copyField, it just doesn't work that way which is 
 what you seem to want to do with the second sentence. Then you bring shingles 
 into the picture...
 
 
 
 So let's take Jack's suggestion and  back up and tell us what the use-case 
 you're trying to support is rather than leaving us to guess what problem 
 you're trying to solve..
 
 
 
 Best,
 
 Erick
 
 
 
 
 
 On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky 
 j...@basetechnology.commailto:j...@basetechnology.com wrote:
 
 As always, you need to first examine how you intend to query the fields 
 before you dive into data modeling. In this case, is there any particular 
 reason that you need the individual terms as separate values, as opposed to 
 simply using a tokenized text field?
 
 
 
 -- Jack Krupansky
 
 
 
 From: Tomer Levi
 
 Sent: Sunday, October 19, 2014 9:07 AM
 
 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 
 Subject: CopyField from text to multi value
 
 
 
 Hi,
 
 
 
 I would like to copy a textual field content into a multivalue filed.
 
 
 
 For example,
 
 
 
 Let's say my field text contains: I am a solr user
 
 
 
 I would like to have a multi-value copyFields with the following
 
 content: [I, am, a, solr, user]
 
 
 
 
 
 
 
 Thanks,
 
 
 
  Tomer Levi
 
 
 
  Software Engineer
 
 
 
  Big Data Group
 
 
 
  Product  Technology Unit
 
 
 
  (T) +972 (9) 775-2693
 
 
 
 
 
 
 
  tomer.l...@nice.commailto:tomer.l...@nice.com
 
 
 
  www.nice.comhttp://www.nice.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Re: CopyField from text to multi value

2014-10-19 Thread Erick Erickson
Not quite sure what you're asking here. If you do a copyField, the raw
input is, well, copied to the destination field and _then_ the analysis
chain is applied. Which seems to be what you want, the destination field
would be a text-based field, perhaps text_general or some such from the
distro.

And perhaps there;s some confusion about what multiValued means here. It
does _not_ mean tokenized, i.e. broken up into words. non-multiValued
fields can be tokenized.

multiValued means tha tmore than one entry for the field can be in a doc.
I.e. (using the XML form of an input doc as an example)

add
  doc
  field name=multisome text/field
  field name=multiand now for something completely different/field
 /doc
/add

will succeed with a field defined as multiValued=true, but fail with
something with multiValued=false.

In either case, though, whether the input was broken up into multiple,
independently-searchable tokens (words) is orthogonal to whether it's
multiValued or not, and is entirely dependent on the analysis chain in the
fieldType for the field in question.

Best,
Erick

On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi tomer.l...@nice.com wrote:

 Hi,

 I would like to copy a textual field content into a multivalue filed.

 For example,

 Let’s say my field text contains: *“I am a solr user”*

 I would like to have a multi-value copyFields with the following content*:
 [“I”, “am”, “a”, “solr”, “user”]*



 *Thanks,*

 *Tomer Levi*

 *Software Engineer  *

 *Big Data Group*

 *Product  Technology Unit*

 (T) +972 (9) 775-2693



 tomer.l...@nice.com

 www.nice.com

 [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png]
 http://twitter.com/NICE_Systems/[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png]
 http://www.facebook.com/pages/NICE-Systems/149072782602/[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png]
 http://www.linkedin.com/company/nice-systems[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png]
 http://www.nice.com/blog



 [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg]
 http://www.nice.com/big-data-solutions







RE: CopyField from text to multi value

2014-10-19 Thread Tomer Levi

Hi Erick,
Thanks for the explanation, I understand that the analysis chain is applied 
after the raw input was copied.
I need to store the output of the analysis chain as a new multi-value field, 
and I think that ShingleFilterFactory might do that, isn’t it?

Tomer

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, October 19, 2014 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: CopyField from text to multi value

Not quite sure what you're asking here. If you do a copyField, the raw input 
is, well, copied to the destination field and _then_ the analysis chain is 
applied. Which seems to be what you want, the destination field would be a 
text-based field, perhaps text_general or some such from the distro.

And perhaps there;s some confusion about what multiValued means here. It does 
_not_ mean tokenized, i.e. broken up into words. non-multiValued fields can 
be tokenized.

multiValued means tha tmore than one entry for the field can be in a doc.
I.e. (using the XML form of an input doc as an example)

add
  doc
  field name=multisome text/field
  field name=multiand now for something completely different/field  
/doc /add

will succeed with a field defined as multiValued=true, but fail with 
something with multiValued=false.

In either case, though, whether the input was broken up into multiple, 
independently-searchable tokens (words) is orthogonal to whether it's 
multiValued or not, and is entirely dependent on the analysis chain in the 
fieldType for the field in question.

Best,
Erick

On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi tomer.l...@nice.com wrote:

 Hi,

 I would like to copy a textual field content into a multivalue filed.

 For example,

 Let’s say my field text contains: *“I am a solr user”*

 I would like to have a multi-value copyFields with the following content*:
 [“I”, “am”, “a”, “solr”, “user”]*



 *Thanks,*

 *Tomer Levi*

 *Software Engineer  *

 *Big Data Group*

 *Product  Technology Unit*

 (T) +972 (9) 775-2693



 tomer.l...@nice.com

 www.nice.com

 [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png]
 http://twitter.com/NICE_Systems/[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png]
 http://www.facebook.com/pages/NICE-Systems/149072782602/[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png]
 http://www.linkedin.com/company/nice-systems[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png]
 http://www.nice.com/blog



 [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg]
 http://www.nice.com/big-data-solutions







Re: CopyField from text to multi value

2014-10-19 Thread Jack Krupansky
As always, you need to first examine how you intend to query the fields before 
you dive into data modeling. In this case, is there any particular reason that 
you need the individual terms as separate values, as opposed to simply using a 
tokenized text field?

-- Jack Krupansky

From: Tomer Levi 
Sent: Sunday, October 19, 2014 9:07 AM
To: solr-user@lucene.apache.org 
Subject: CopyField from text to multi value

Hi,

I would like to copy a textual field content into a multivalue filed.

For example,

Let’s say my field text contains: “I am a solr user”

I would like to have a multi-value copyFields with the following content: [“I”, 
“am”, “a”, “solr”, “user”]

 

Thanks,

  Tomer Levi
 
  Software Engineer  

  Big Data Group
 
  Product  Technology Unit
 
  (T) +972 (9) 775-2693
 
   
 
  tomer.l...@nice.com 
 
  www.nice.com
 

 
 
   
 

 

 

 


Re: CopyField from text to multi value

2014-10-19 Thread Erick Erickson
This really feels like an  XY problem, which I think Jack is alluding to.

bq:  I understand that the analysis chain is applied after the raw
input was copied.
I need to store the output of the analysis chain as a new multi-value field

This statement is really confusing. You can't have the output of the analysis
chain used as input to a copyField, it just doesn't work that way which is what
you seem to want to do with the second sentence. Then you bring shingles
into the picture...

So let's take Jack's suggestion and  back up and tell us what the use-case
you're trying to support is rather than leaving us to guess what problem
you're trying to solve..

Best,
Erick


On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.com wrote:
 As always, you need to first examine how you intend to query the fields 
 before you dive into data modeling. In this case, is there any particular 
 reason that you need the individual terms as separate values, as opposed to 
 simply using a tokenized text field?

 -- Jack Krupansky

 From: Tomer Levi
 Sent: Sunday, October 19, 2014 9:07 AM
 To: solr-user@lucene.apache.org
 Subject: CopyField from text to multi value

 Hi,

 I would like to copy a textual field content into a multivalue filed.

 For example,

 Let’s say my field text contains: “I am a solr user”

 I would like to have a multi-value copyFields with the following content: 
 [“I”, “am”, “a”, “solr”, “user”]



 Thanks,

   Tomer Levi

   Software Engineer

   Big Data Group

   Product  Technology Unit

   (T) +972 (9) 775-2693



   tomer.l...@nice.com

   www.nice.com














RE: CopyField from text to multi value

2014-10-19 Thread Tomer Levi
Thanks again for the help.



The use case is this.

In my UI I would like to indicate which words leaded to every document in the 
response.

It actually seems like a simple highlight case but instead of getting the 
highlight result as this is a brlong/br string brwith/br text,

Our UI team wants a list of words, i.e:[long, with].



So, I assumed that I can just tokenize the original text - copy the tokens 
into new multi-value fields - ask Solr to highlight the multi-value field



That is my use case.

Thanks again

Tomer





-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Sunday, October 19, 2014 5:18 PM
To: solr-user@lucene.apache.org
Subject: Re: CopyField from text to multi value



This really feels like an  XY problem, which I think Jack is alluding to.



bq:  I understand that the analysis chain is applied after the raw input was 
copied.

I need to store the output of the analysis chain as a new multi-value field



This statement is really confusing. You can't have the output of the analysis 
chain used as input to a copyField, it just doesn't work that way which is what 
you seem to want to do with the second sentence. Then you bring shingles into 
the picture...



So let's take Jack's suggestion and  back up and tell us what the use-case 
you're trying to support is rather than leaving us to guess what problem you're 
trying to solve..



Best,

Erick





On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky 
j...@basetechnology.commailto:j...@basetechnology.com wrote:

 As always, you need to first examine how you intend to query the fields 
 before you dive into data modeling. In this case, is there any particular 
 reason that you need the individual terms as separate values, as opposed to 
 simply using a tokenized text field?



 -- Jack Krupansky



 From: Tomer Levi

 Sent: Sunday, October 19, 2014 9:07 AM

 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org

 Subject: CopyField from text to multi value



 Hi,



 I would like to copy a textual field content into a multivalue filed.



 For example,



 Let’s say my field text contains: “I am a solr user”



 I would like to have a multi-value copyFields with the following

 content: [“I”, “am”, “a”, “solr”, “user”]







 Thanks,



   Tomer Levi



   Software Engineer



   Big Data Group



   Product  Technology Unit



   (T) +972 (9) 775-2693







   tomer.l...@nice.commailto:tomer.l...@nice.com



   www.nice.comhttp://www.nice.com


























Re: CopyField from text to multi value

2014-10-19 Thread Walter Underwood
I think that info is available with termvectors. That should give a list of the 
query terms that matched each document, if I understand it correctly.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Oct 19, 2014, at 7:37 AM, Tomer Levi tomer.l...@nice.com wrote:

 Thanks again for the help.
 
 
 
 The use case is this.
 
 In my UI I would like to indicate which words leaded to every document in the 
 response.
 
 It actually seems like a simple highlight case but instead of getting the 
 highlight result as this is a brlong/br string brwith/br text,
 
 Our UI team wants a list of words, i.e:[long, with].
 
 
 
 So, I assumed that I can just tokenize the original text - copy the tokens 
 into new multi-value fields - ask Solr to highlight the multi-value field
 
 
 
 That is my use case.
 
 Thanks again
 
 Tomer
 
 
 
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Sunday, October 19, 2014 5:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: CopyField from text to multi value
 
 
 
 This really feels like an  XY problem, which I think Jack is alluding to.
 
 
 
 bq:  I understand that the analysis chain is applied after the raw input was 
 copied.
 
 I need to store the output of the analysis chain as a new multi-value field
 
 
 
 This statement is really confusing. You can't have the output of the analysis 
 chain used as input to a copyField, it just doesn't work that way which is 
 what you seem to want to do with the second sentence. Then you bring shingles 
 into the picture...
 
 
 
 So let's take Jack's suggestion and  back up and tell us what the use-case 
 you're trying to support is rather than leaving us to guess what problem 
 you're trying to solve..
 
 
 
 Best,
 
 Erick
 
 
 
 
 
 On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky 
 j...@basetechnology.commailto:j...@basetechnology.com wrote:
 
 As always, you need to first examine how you intend to query the fields 
 before you dive into data modeling. In this case, is there any particular 
 reason that you need the individual terms as separate values, as opposed to 
 simply using a tokenized text field?
 
 
 
 -- Jack Krupansky
 
 
 
 From: Tomer Levi
 
 Sent: Sunday, October 19, 2014 9:07 AM
 
 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 
 Subject: CopyField from text to multi value
 
 
 
 Hi,
 
 
 
 I would like to copy a textual field content into a multivalue filed.
 
 
 
 For example,
 
 
 
 Let’s say my field text contains: “I am a solr user”
 
 
 
 I would like to have a multi-value copyFields with the following
 
 content: [“I”, “am”, “a”, “solr”, “user”]
 
 
 
 
 
 
 
 Thanks,
 
 
 
  Tomer Levi
 
 
 
  Software Engineer
 
 
 
  Big Data Group
 
 
 
  Product  Technology Unit
 
 
 
  (T) +972 (9) 775-2693
 
 
 
 
 
 
 
  tomer.l...@nice.commailto:tomer.l...@nice.com
 
 
 
  www.nice.comhttp://www.nice.com