Re: Data import
OK, that makes sense, but when solr when run dataimport identifies the new an existing document with the same uniquekey that is being indexed,right? Because when the same document exists on the source, it deletes it and creates a new one. Instead of that, is not possible to discard the new document instead of delete and create a new one? On Sep 10, 2013, at 2:16 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Sounds like you want a custom UpdateRequestProcessor chain that checks if the document already exists with given primary key and does not even bother passing it on to the next processor in the chain. This would make sense as an optimization or as a first step in a complex update chain that perhaps uses a lot of external resources to pre-process the content (e.g. named entities extraction). I don't think such URP exist at the moment? But it should be simple to write one assuming URPs can do lookups by primary IDs and have go/no-go decisions on individual documents. Anybody knows the details of this? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Sep 10, 2013 at 7:53 AM, Luis Portela Afonso meligalet...@gmail.com wrote: But with atomic updates i need to send the information, right? I want that solr automatic indexes it. And he is doing that. Can you look at the solr example in the source? There is an example on example-DIH folder. Imagine that you run the URL to import the data every 15 minutes. If the same information is already indexed, solr will update it, and by update I mean delete and index again. I just want that solr simple discards the information if this already exists with indexed. On Tuesday, September 10, 2013, Chris Hostetter wrote: : With cron job, I do a http request using curl, to the address : http://localhost:port /solr/core/dataimport/?command=full-importclean=false : : When it runs, if the rss source has a feed that is already indexed on solr, : it updates the existing source. : So if the source has the same information of the destiny, it updates the : information on the destiny. : : I want to prevent that. Is that explicit? I may try to provide some : examples. Yes, specific examples would be helpful -- it's not really clear what it is that you want to prevent. Please note the URL i mentioned before and use it as a guideline for how much detail we need to understand what it is you are asking... : Can you please be more specific about what you would like to see happen, : we can better understand what your actual goal is? It's really not clear : https://wiki.apache.org/solr/UsingMailingLists -Hoss -- Sent from Gmail Mobile smime.p7s Description: S/MIME cryptographic signature
Re: Javascript StatelessScriptUpdateProcessor
Solved On Sep 10, 2013, at 4:55 PM, Luís Portela Afonso meligalet...@gmail.com wrote: It's that possible to execute queries on a javascript script on StatelessScriptUpdateProcessor. I'm processing data with a javascript i want to execute a query to the indexed data of solr. I know that the javascript script, has an instance of SolrQueryRequest and SolrQueryResponse, but neither can be used. At least i'm not being able to use it. smime.p7s Description: S/MIME cryptographic signature
Javascript StatelessScriptUpdateProcessor
It's that possible to execute queries on a javascript script on StatelessScriptUpdateProcessor. I'm processing data with a javascript i want to execute a query to the indexed data of solr. I know that the javascript script, has an instance of SolrQueryRequest and SolrQueryResponse, but neither can be used. At least i'm not being able to use it. smime.p7s Description: S/MIME cryptographic signature
Re: Data import
When i run dataimport/?command=full-importclean=false, solr add new documents with the information. But if the same information already exists with the same uniquekey, it replaces the existing document with a new one. It does not update the document, it creates a new one. It's that possible? I'm indexing rss feeds. I run the rss example that exists in the solr examples, and i does that. On Sep 9, 2013, at 4:10 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: What do you specifically mean by the disable document update? Do you mean in-place update? Or do you mean you want to run the import but not actually populate Solr collection with processed documents? It might help to explain the business level goal you are trying to achieve. Or, specific error that you are perhaps seeing and trying to avoid. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Sep 9, 2013 at 6:42 AM, Luís Portela Afonso meligalet...@gmail.comwrote: Hi, It's possible to disable document update when running data import, full-import command? Thanks smime.p7s Description: S/MIME cryptographic signature
Data import
Hi, It's possible to disable document update when running data import, full-import command? Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Solr documents update on index
Hi, But i'm indexing rss feeds. I want that solr indexes that without change the existing information of a document with the same uniqueKey. The best approach is that solr updates the doc if changes are detected, but i can leave without that. I really would like that solr does not update the document if it already exists. I'm using the DataImportScheduler to solr index launch the scheduled index. Appreciate any possible help. On Sep 6, 2013, at 9:16 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Yes, if a document with the same key exists, then the old document will be deleted and replaced with the new document. You can also partially update documents (we call it atomic updates) which reads the old document from local index, updates it according to the request and then replaces the old document with the new one. See https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-UpdatingOnlyPartofaDocument On Fri, Sep 6, 2013 at 1:03 AM, Luis Portela Afonso meligalet...@gmail.com wrote: Hi, I'm having a problem when solr indexes. It is updating documents already indexed. Is this a normal behavior? If a document with the same key already exists is it supposed to be updated? I has thinking that is supposed to just update if the information on the rss has changed. Appreciate your help -- Sent from Gmail Mobile -- Regards, Shalin Shekhar Mangar. smime.p7s Description: S/MIME cryptographic signature
Re: SOLR Prevent solr of modifying fields when update doc
Hi thanks by the answer, but the uniqueId is generated by me. But when solr indexes and there is an update in a doc, it deletes the doc and creates a new one, so it generates a new UUID. It is not suitable for me, because i want that solr just updates some fields, because the UUID is the key that i use to map it to an user in my database. Right now i'm using information that comes from the source and never chages, as my uniqueId, like for example the guid, that exists in some rss feeds, or if it doesn't exists i use link. I think there is any simple solution for me, because for what i have read, when an update to a doc exists, SOLR deletes the old one and create a new one, right? On Aug 23, 2013, at 12:07 PM, Erick Erickson erickerick...@gmail.com wrote: Well, not much in the way of help because you can't do what you want AFAIK. I don't think UUID is suitable for your use-case. Why not use your uniqueId? Or generate something yourself... Best Erick On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, How can i prevent solr from update some fields when updating a doc? The problem is, i have an uuid with the field name uuid, but it is not an unique key. When a rss source updates a feed, solr will update the doc with the same link but it generates a new uuid. This is not the desired because this id is used by me to relate feeds with an user. Can someone help me? Many Thanks smime.p7s Description: S/MIME cryptographic signature
SOLR Prevent solr of modifying fields when update doc
Hi, How can i prevent solr from update some fields when updating a doc? The problem is, i have an uuid with the field name uuid, but it is not an unique key. When a rss source updates a feed, solr will update the doc with the same link but it generates a new uuid. This is not the desired because this id is used by me to relate feeds with an user. Can someone help me? Many Thanks smime.p7s Description: S/MIME cryptographic signature
SOLR Copy field if no value on destination
Hi, Is possible to copy a value of a field to another if the destination doesn't have value? An example: Indexing an rss The feed has the fields link and guid, but sometimes guid cannot be present in the feed I have a field that i will copy values with the name finalLink Now i want to copy guid to finalLink, but if guid has not value i want to copy link. My question is, is that possible just with the schema, Processors, solrconfig.xml, and the data-config? Thanks a lot smime.p7s Description: S/MIME cryptographic signature
Re: SOLR Copy field if no value on destination
Oh yeah. Hi have seen that Processor on the book and i was not able to remember. Thanks a lot. And thanks a lot for your solution. It works :) On Aug 8, 2013, at 1:52 AM, Jack Krupansky j...@basetechnology.com wrote: Here's the actual update processor I used (and tested): updateRequestProcessorChain name=first-default-field processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcemain_s/str str name=destfinal_s/str /processor processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcebackup_s/str str name=destfinal_s/str /processor processor class=solr.FirstFieldValueUpdateProcessorFactory str name=fieldNamefinal_s/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, August 07, 2013 8:20 PM To: solr-user@lucene.apache.org Subject: Re: SOLR Copy field if no value on destination Sorry, I am unable to untangle the logic you are expressing, but I can can assure you that JavaScript and the StatelessScriptUpdate processor has full support for implementing spaghetti code logic as tangled as desired! Simpler forms of logic can be implemented directly using non-script update processor sequences, but once you start adding conditionals, there is a 50% chance that you will need a script. There is a Default Value update processor, but it takes a literal value. Hmmm... maybe I’ll come up with a “default-value” script that takes a field name for the default value. IOW, it would copy a specified field to the destination IFF the destination had no value. Ahhh... wait... maybe... you could do this with the First Value Update processor: 1. Copy guid to FinalLink. (Clone Update processor). 2. Copy link to FinalLink. (Clone Update processor). 3. First Value Update processor. So, step 3 would leave link if guid was not there, or keep guid if it is there and discard link. Yes, that should do it. This is worth an example in the book! Thanks for the inspiration! -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, August 07, 2013 7:22 PM To: solr-user@lucene.apache.org Subject: SOLR Copy field if no value on destination Hi, Is possible to copy a value of a field to another if the destination doesn't have value? An example: a.. Indexing an rss b.. The feed has the fields link and guid, but sometimes guid cannot be present in the feed c.. I have a field that i will copy values with the name finalLink Now i want to copy guid to finalLink, but if guid has not value i want to copy link. My question is, is that possible just with the schema, Processors, solrconfig.xml, and the data-config? Thanks a lot smime.p7s Description: S/MIME cryptographic signature
SOLR FieldCopyProcessorFactory
Hi, Exists something like FieldCopyProcessorFactory. I know there is a CloneFieldProfessor, but i'm interested to do an append. Is that possible? Many Thanks smime.p7s Description: S/MIME cryptographic signature
Field append
Hi there, Is that possible to append two fields on solr? i would like to append to filters with a custom delimiter. Is that possible? I saw something like a CloneFieldUpdateProcessor, but when i try to use, solr says that cannot find the class. I saw that in the follow site: https://issues.apache.org/jira/browse/SOLR-2599 In the comments i saw: processor class=solr.FieldCopyProcessorFactory str name=sourcecategory/str str name=destcategory_s/str /processor But i'm not able to use it too. Once again solr says that cannot find class. Hope you can help in any way. Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Solr PolyField
Hi, I have tried the solr.CloneFieldUpdateProcessorFactory sugested in the pool but the fields are not copied. My dataconfig.xml field column=enclosure_type xpath=/rss/channel/item/enclosure/@type / My schema.xml dynamicField name=enclosure_* type=string indexed=false stored=true multiValued=true / !-- /field -- !-- dynamicField name=enclosure_* type=string indexed=false stored=true multiValued=false / -- field name=enclosure type=text indexed=true stored=true multiValued=true / My solrconfig.xml updateRequestProcessorChain name=multiple-clones processor class=solr.CloneFieldUpdateProcessorFactory str name=sourceenclosure_title/str str name=destenclosure/str /processor /updateRequestProcessorChain and requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configrss-data-config.xml/str str name=update.chainmultiple-clones/str str name=update.chainfixIndexedValues/str /lst /requestHandler Can you help? Thanks ;) On Jul 31, 2013, at 6:03 PM, Luís Portela Afonso meligalet...@gmail.com wrote: Ok, thanks. I will check it. On Jul 31, 2013, at 5:08 PM, Jack Krupansky j...@basetechnology.com wrote: See: https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html I have more examples in my book. -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, July 31, 2013 11:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField Hum, ok. It's possible to add to a field, static text? Text that i write on the configuration and then append another field? I saw something like CloneFieldProcessor but when i'm starting solr, it says that could not find the class. I was trying to use processors to move one field to another. I saw this: processor class=solr.FieldCopyProcessorFactory str name=sourcelastname firstname/str str name=destfullname/str bool name=appendtrue/bool str name=append.delim, /str /processor But when i try to use it solr says that he cannot find the solr.FieldCopyProcessorFactory. I'm using solr 4.4.0 Thanks ;) On Jul 31, 2013, at 4:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: OK, Then I would suggest creating multiValued enclosure_type, etc. tags for searching, and then one string-typed field to store the JSON snippet you've been showing. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso meligalet...@gmail.com wrote: As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso meligalet...@gmail.com wrote: This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence
Re: Solr PolyField
So i have merged the two chains in one, and this is not copying. Hum… My solrconfig.xml updateRequestProcessorChain name=fixIndexedValues !-- processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameuuid/str /processor -- processor class=solr.CloneFieldUpdateProcessorFactory str name=sourceenclosure_title/str str name=destenclosure/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain I try too with the UUIDUpdateProcessorFactory commented and nothing happens. Weird. On Aug 1, 2013, at 5:37 PM, Jack Krupansky j...@basetechnology.com wrote: Hmmm... not sure what happens if you have two update chains specified: str name=update.chainmultiple-clones/str str name=update.chainfixIndexedValues/str You need to merge them into one. -- Jack Krupansky From: Luís Portela Afonso Sent: Thursday, August 01, 2013 12:26 PM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField Hi, I have tried the solr.CloneFieldUpdateProcessorFactory sugested in the pool but the fields are not copied. My dataconfig.xml field column=enclosure_type xpath=/rss/channel/item/enclosure/@type / My schema.xml dynamicField name=enclosure_* type=string indexed=false stored=true multiValued=true / !-- /field -- !-- dynamicField name=enclosure_* type=string indexed=false stored=true multiValued=false / -- field name=enclosure type=text indexed=true stored=true multiValued=true / My solrconfig.xml updateRequestProcessorChain name=multiple-clones processor class=solr.CloneFieldUpdateProcessorFactory str name=sourceenclosure_title/str str name=destenclosure/str /processor /updateRequestProcessorChain and requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configrss-data-config.xml/str str name=update.chainmultiple-clones/str str name=update.chainfixIndexedValues/str /lst /requestHandler Can you help? Thanks ;) On Jul 31, 2013, at 6:03 PM, Luís Portela Afonso meligalet...@gmail.com wrote: Ok, thanks. I will check it. On Jul 31, 2013, at 5:08 PM, Jack Krupansky j...@basetechnology.com wrote: See: https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html I have more examples in my book. -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, July 31, 2013 11:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField Hum, ok. It's possible to add to a field, static text? Text that i write on the configuration and then append another field? I saw something like CloneFieldProcessor but when i'm starting solr, it says that could not find the class. I was trying to use processors to move one field to another. I saw this: processor class=solr.FieldCopyProcessorFactory str name=sourcelastname firstname/str str name=destfullname/str bool name=appendtrue/bool str name=append.delim, /str /processor But when i try to use it solr says that he cannot find the solr.FieldCopyProcessorFactory. I'm using solr 4.4.0 Thanks ;) On Jul 31, 2013, at 4:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: OK, Then I would suggest creating multiValued enclosure_type, etc. tags for searching, and then one string-typed field to store the JSON snippet you've been showing. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso meligalet...@gmail.com wrote: As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications
Re: Solr PolyField
Oh my god. Thanks for notice. The field name its wrong. It should be enclosure_type. I'm so sorry. On Aug 1, 2013, at 6:33 PM, Jack Krupansky j...@basetechnology.com wrote: Are you sure the “enclosure_title” field is populated? Have you updated the request handler? -- Jack Krupansky From: Luís Portela Afonso Sent: Thursday, August 01, 2013 1:23 PM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField So i have merged the two chains in one, and this is not copying. Hum… My solrconfig.xml updateRequestProcessorChain name=fixIndexedValues !-- processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameuuid/str /processor -- processor class=solr.CloneFieldUpdateProcessorFactory str name=sourceenclosure_title/str str name=destenclosure/str /processor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain I try too with the UUIDUpdateProcessorFactory commented and nothing happens. Weird. On Aug 1, 2013, at 5:37 PM, Jack Krupansky j...@basetechnology.com wrote: Hmmm... not sure what happens if you have two update chains specified: str name=update.chainmultiple-clones/str str name=update.chainfixIndexedValues/str You need to merge them into one. -- Jack Krupansky From: Luís Portela Afonso Sent: Thursday, August 01, 2013 12:26 PM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField Hi, I have tried the solr.CloneFieldUpdateProcessorFactory sugested in the pool but the fields are not copied. My dataconfig.xml field column=enclosure_type xpath=/rss/channel/item/enclosure/@type / My schema.xml dynamicField name=enclosure_* type=string indexed=false stored=true multiValued=true / !-- /field -- !-- dynamicField name=enclosure_* type=string indexed=false stored=true multiValued=false / -- field name=enclosure type=text indexed=true stored=true multiValued=true / My solrconfig.xml updateRequestProcessorChain name=multiple-clones processor class=solr.CloneFieldUpdateProcessorFactory str name=sourceenclosure_title/str str name=destenclosure/str /processor /updateRequestProcessorChain and requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configrss-data-config.xml/str str name=update.chainmultiple-clones/str str name=update.chainfixIndexedValues/str /lst /requestHandler Can you help? Thanks ;) On Jul 31, 2013, at 6:03 PM, Luís Portela Afonso meligalet...@gmail.com wrote: Ok, thanks. I will check it. On Jul 31, 2013, at 5:08 PM, Jack Krupansky j...@basetechnology.com wrote: See: https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html I have more examples in my book. -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, July 31, 2013 11:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField Hum, ok. It's possible to add to a field, static text? Text that i write on the configuration and then append another field? I saw something like CloneFieldProcessor but when i'm starting solr, it says that could not find the class. I was trying to use processors to move one field to another. I saw this: processor class=solr.FieldCopyProcessorFactory str name=sourcelastname firstname/str str name=destfullname/str bool name=appendtrue/bool str name=append.delim, /str /processor But when i try to use it solr says that he cannot find the solr.FieldCopyProcessorFactory. I'm using solr 4.4.0 Thanks ;) On Jul 31, 2013, at 4:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: OK, Then I would suggest creating multiValued enclosure_type, etc. tags for searching, and then one string-typed field to store the JSON snippet you've been showing. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso meligalet...@gmail.com wrote: As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So
Solr PolyField
Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Solr PolyField
Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ With my current configuration, this is working and i get a result like that: enclosure: [ audio/mpeg, http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, 37521428 ], BUT, this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the type, or the url, or the length. I want to reach something like: enclosure: { type: audio/mpeg, url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Solr PolyField
This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the *type*, or the * url,* or the *length*. * * *I want to reach something like:* - - enclosure: { - type: a http://www.gazzetta.it/udio/mpeg, - url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Solr PolyField
As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso meligalet...@gmail.com wrote: This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url= http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the *type*, or the * url,* or the *length*. * * *I want to reach something like:* - - enclosure: { - type: a http://www.gazzetta.it/udio/mpeg, - url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Solr PolyField
Hum, ok. It's possible to add to a field, static text? Text that i write on the configuration and then append another field? I saw something like CloneFieldProcessor but when i'm starting solr, it says that could not find the class. I was trying to use processors to move one field to another. I saw this: processor class=solr.FieldCopyProcessorFactory str name=sourcelastname firstname/str str name=destfullname/str bool name=appendtrue/bool str name=append.delim, /str /processor But when i try to use it solr says that he cannot find the solr.FieldCopyProcessorFactory. I'm using solr 4.4.0 Thanks ;) On Jul 31, 2013, at 4:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: OK, Then I would suggest creating multiValued enclosure_type, etc. tags for searching, and then one string-typed field to store the JSON snippet you've been showing. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso meligalet...@gmail.com wrote: As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso meligalet...@gmail.com wrote: This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url= http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the *type*, or the * url,* or the *length*. * * *I want to reach something like:* - - enclosure: { - type: a http://www.gazzetta.it/udio/mpeg, - url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it
Re: Solr PolyField
Ok, thanks. I will check it. On Jul 31, 2013, at 5:08 PM, Jack Krupansky j...@basetechnology.com wrote: See: https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html I have more examples in my book. -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, July 31, 2013 11:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField Hum, ok. It's possible to add to a field, static text? Text that i write on the configuration and then append another field? I saw something like CloneFieldProcessor but when i'm starting solr, it says that could not find the class. I was trying to use processors to move one field to another. I saw this: processor class=solr.FieldCopyProcessorFactory str name=sourcelastname firstname/str str name=destfullname/str bool name=appendtrue/bool str name=append.delim, /str /processor But when i try to use it solr says that he cannot find the solr.FieldCopyProcessorFactory. I'm using solr 4.4.0 Thanks ;) On Jul 31, 2013, at 4:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: OK, Then I would suggest creating multiValued enclosure_type, etc. tags for searching, and then one string-typed field to store the JSON snippet you've been showing. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso meligalet...@gmail.com wrote: As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso meligalet...@gmail.com wrote: This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url= http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying
Solr rss indexation doubt
Hi, I'm using Apache Solr to index RSS Feeds. I'm with success getting data (url and if feed is active to index) from a database, and using that has a source of an entity to index the rss data. I'm trying to reach a result but i don't get it. I will try to explain that with an example. The RSS feed has something like: enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ In my Schema.xml: dynamicField name=enclosure_* type=string indexed=false stored=false multiValued=true / field name=enclosure type=text indexed=true stored=true multiValued=true / copyField source=enclosure_* dest=enclosure / In my data-config.xml: dataSource name=sql-ds type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=user password=pass readOnly=true/ dataSource name=url-ds type=URLDataSource / document entity name=src rootEntity=false dataSource=sql-ds query=SELECT Feeds.IDFeed, Feeds.FeedUrl FROM meshapp.Feeds where Feeds.Active = 1 !-- Field created by MeshApp Reader to identify the source -- field column=IDFeed name=source-id / entity name=xml rootEntity=true dataSource=url-ds url='${src.FeedUrl}' onError=skip processor=XPathEntityProcessor forEach=/rss/channel/item | /rss/channel transformer=DateFormatTransformer !-- Lot of fields -- field column=enclosure_url xpath=/rss/channel/item/enclosure/@url / field column=enclosure_length xpath=/rss/channel/item/enclosure/@length / field column=enclosure_type xpath=/rss/channel/item/enclosure/@type / /entity /entity /document This is working and i get the a result like that: enclosure: [ audio/mpeg, http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, 37521428 ], BUT, this is not the result that i'm trying to reach. I want to reach something like: enclosure: { type: audio/mpeg, url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? I have try something like (doesn't work): In my schema.xml (I think that this doesn't make sense): field name=enclosure type=html indexed=false stored=true multiValued=true dynamicField name=enclosure_* type=string indexed=false stored=true multiValued=true / /field In my data-config.xml: field name=enclosure column=enclosure -- field column=enclosure_url xpath=/rss/channel/item/enclosure/@url / field column=enclosure_length xpath=/rss/channel/item/enclosure/@length / field column=enclosure_type xpath=/rss/channel/item/enclosure/@type / /field Can you please help me? Many Thanks, Luís Portela Afonso smime.p7s Description: S/MIME cryptographic signature