Re: Updating data

2013-02-05 Thread Dikchant Sahi
If I understand it right, you want the json to only the new fields and not
the field that has already been indexed/stored.

Check out Solr Atomic updates. Below are some links which might help.
http://wiki.apache.org/solr/Atomic_Updates
http://yonik.com/solr/atomic-updates/

Remember, it requires the fields to be stored.

On Tue, Feb 5, 2013 at 12:35 PM, anurag.jain anurag.k...@gmail.com wrote:

 i already indexing 180 data in solr index. all files were in json
 format.

 so data was like -

 [
 {
 id:1,
 first_name:anurag,
 last_name:jain,
 ...
 },

 {
 id:2,
 first_name:abhishek,
 last_name:jain,
 ...
 }, ...
 ]


 now i have to add a field in data like



 [
 {
 id:1,
 first_name:anurag,
 last_name:jain,
 new_field:xvz,
 ...
 },

 {
 id:2,
 first_name:abhishek,
 last_name:jain,
 new_field:xvz,
 ...
 }, ...
 ]


 but i want that :
 my json file like that
 [
 {
 id:1,
 new_field:xvz
 },
 {
 id:2,
 new_field:xvz
 }
 ]

 so it automatically update in solr like this file is doing.

 [
 {
 id:1,
 first_name:anurag,
 last_name:jain,
 new_field:xvz,
 ...
 },

 {
 id:2,
 first_name:abhishek,
 last_name:jain,
 new_field:xvz,
 ...
 }, ...
 ]



 any solutions ? please reply



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Updating-data-tp4038492.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tokenized keywords

2013-01-21 Thread Dikchant Sahi
Tokenizer changes the behavior of how you search/index and not how you
store. What i understand is you want to display tokenized result always and
not just for debug purpose.

debugQuery has performance implications that should not be used for what
you are trying to achieve.

Basically, what you need is a way to store filtered and lowercased tokens
in the 'modified' field. What I see as a solution is either
you ingest 'original' field with your desired tokens directly instead of
using copyfield or write some custom code to store/index only the filtered
and lowercased result eg. custom transformer can be explored if you are
using data import handler.


On Mon, Jan 21, 2013 at 1:47 PM, Romita Saha
romita.s...@sg.panasonic.comwrote:

 Hi,

 I have a field defined in scheme.xml named as 'original'. I first copy
 this field to modified and apply filters on this field modified.

 field name=original type=string indexed=true stored=true/
 field name=modified type=text_general indexed=true stored=true/

  copyField source=original dest=modified/

 I want to display in my responseas follows:

 original: Search for all the Laptops
 modified: search laptop

 Thanks and regards,
 Romita Saha

 Panasonic RD Center Singapore
 Blk 1022 Tai Seng Avenue #06-3530
 Tai Seng Ind. Est. Singapore 534415
 DID: (65) 6550 5383 FAX: (65) 6550 5459
 email: romita.s...@sg.panasonic.com



 From:   Mikhail Khludnev mkhlud...@griddynamics.com
 To: solr-user@lucene.apache.org,
 Date:   01/21/2013 03:48 PM
 Subject:Re: Tokenized keywords



 Romita,
 That's what exactly is shown debugQuery output. If you cant find it there,
 paste output here, let's try to find together. Also pay attention to
 explainOther debug parameter and analisys page in admin ui.
 21.01.2013 10:50 пользователь Romita Saha romita.s...@sg.panasonic.com
 написал:

  What I am trying to achieve is as follows.
 
  I query Search for all the Laptops and my tokenized key words are
  search laptop (I apply stopword filter to filter out words like
  for,all,the and i also user lowercase filter).
  I want to display these tokenized keywords using debugQuery.
 
  Thanks and regards,
  Romita
 
 
 
  From:   Dikchant Sahi contacts...@gmail.com
  To: solr-user@lucene.apache.org,
  Date:   01/21/2013 02:26 PM
  Subject:Re: Tokenized keywords
 
 
 
  Can you please elaborate a more on what you are trying to achieve.
 
  Tokenizers work on indexed field and doesn't effect how the values will
 be
  displayed. The response value comes from stored field. If you want to
 see
  how your query is being tokenized, you can do it using analysis
 interface
  or enable debugQuery to see how your query is being formed.
 
 
  On Mon, Jan 21, 2013 at 11:06 AM, Romita Saha
  romita.s...@sg.panasonic.comwrote
 
   Hi,
  
   I use some tokenizers to tokenize the query. I want to see the
 tokenized
   query words displayed in the response.Could you kindly help me do
  that.
  
   Thanks and regards,
   Romita
 
 





Re: Tokenized keywords

2013-01-20 Thread Dikchant Sahi
Can you please elaborate a more on what you are trying to achieve.

Tokenizers work on indexed field and doesn't effect how the values will be
displayed. The response value comes from stored field. If you want to see
how your query is being tokenized, you can do it using analysis interface
or enable debugQuery to see how your query is being formed.


On Mon, Jan 21, 2013 at 11:06 AM, Romita Saha
romita.s...@sg.panasonic.comwrote

 Hi,

 I use some tokenizers to tokenize the query. I want to see the tokenized
 query words displayed in the response.Could you kindly help me do that.

 Thanks and regards,
 Romita


Re: MultiValue

2013-01-17 Thread Dikchant Sahi
you just need to make the field as multivalued.

field name=last_name type=string indexed=true stored=true * */
field name=trainingskill type=string indexed=true stored=true
*multiValued=true
*/

type should be set based on your search requirements.

On Thu, Jan 17, 2013 at 11:27 PM, anurag.jain anurag.k...@gmail.com wrote:

 my json file look like

 [ { last_name : jain, training_skill:[c, c++, php,java,.net] }]

 can u please suggest me how should i declare field in schema for
 trainingskill field



 please reply

 urgent





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/MultiValue-tp4034305.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: MultiValue

2013-01-17 Thread Dikchant Sahi
You mean to say that the problem is with json which is being ingested.

What you are trying to achieve is that you want to split the values on the
basis of comma and index it as multiple value.

What problem you are facing in indexing json in format Solr expects. If you
don't have control over it, probably you can try playing with custom
processors.




On Fri, Jan 18, 2013 at 12:31 AM, anurag.jain anurag.k...@gmail.com wrote:

   [ { last_name : jain, training_skill:[c, c++, php,java,.net]
 }
 ]

 actually i want to tokenize in   c c++ php java .net


 so through this i can make them as facet.


 but problem is in list
 training_skill:[c, c++, *php,java,.net*]






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034316.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Dikchant Sahi
As someone in the forum correctly said, if all Solr releases were
evolutionary Solr 4.0 is revolutionary. It has lots of improvement over the
previous releases like NoSql features, atomic updates, cloud features and
lot more.

Solr 4.0 would be the right migration I believe.

Can someone in the forum provide a reason to migrate to 3.6.2 and not 4.0

On Fri, Jan 4, 2013 at 5:16 PM, vijeshnair vijeshkn...@gmail.com wrote:

 We are starting a new e-com application from this month onwards, for which
 I
 am trying to identify the right SOLR release. We were using 3.4 in our
 previous project, bu I have read in multiple blogs and forums about the
 improvements that SOLR 4 has in terms of efficient memory management, less
 OOMs etc. So my question would be, can I start using SOLR 4 for my new
 project ? Why is it that Apache keeping both 3.6.2 and 4.0 releases in the
 downloads? Are there any major changes in 4.0 comparing to 3.x, so that I
 should study those changes before getting in to 4.0 ?  Please help, so that
 I can propose 4.0 to my team.

 Thanks
 Vijesh Nair



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-3-6-2-or-4-0-tp4030527.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr atomic update of multi-valued field

2012-12-19 Thread Dikchant Sahi
Hi Erick,

The name field is stored. I experience problem only when I update
multiValued field with multiple values like,
* field name=skills update=setsolr/field*
* field name=skills update=setlucene/field*
*
*
It works perfect, when I set single value for multiValued field like,
*field name=skills update=setsolr/field*

Thanks,
Dikchant

On Wed, Dec 19, 2012 at 6:25 PM, Erick Erickson erickerick...@gmail.comwrote:

 FIrst question: Is the name field stored (stored=true)? If it isn't,
 that would explain your problems with that field. _all_ relevant fields
 (i.e. everything not a destination of a copyField) need to be stored for
 atomic updates to work.

 Your second problem I'm not sure about. I remember some JIRAs about
 multivalued fields and atomic updates, you might get some info from the
 JIRAs here: https://issues.apache.org/jira/browse/SOLR

 but updating multiValued fields _should_ work...

 Best
 Erick


 On Tue, Dec 18, 2012 at 2:20 AM, Dikchant Sahi contacts...@gmail.com
 wrote:

  Hi,
 
  Does Solr 4.0 allows to update the values of multi-valued field? Say I
 have
  list of values for skills field like java, j2ee and i want to change it
  to solr, lucene.
 
  I was trying to play with atomic updates and below is my observation:
 
  I have following document in my index:
  doc
  str name=id1/str
  str name=nameDikchant/str
  str name=professionsoftware engineer/str
  arr name=skills
  strjava/str
  strj2ee/str
  /arr
  /doc
 
  To update the skills to solr, lucene, I indexed document as follows:
 
  *add*
  *  doc*
  *field name=id1/field*
  *field name=skills update=setsolr/field*
  *field name=skills update=setlucene/field*
  *  /doc*
  */add*
 
  The document added to index is as follows:
  *doc*
  *  str name=id1/str*
  *  arr name=skills*
  *str{set=solr}/str*
  *str{set=lucene}/str*
  *  /arr*
  */doc*
 
  This is not what I was looking for. I found 2 issues:
  1. The value of name field was lost
  2. The skills fields had some junks like *{set=solr}*
  *
  *
  *
  *
  Then, to achieve my goal, I tried something different. I tried setting
 some
  single valued field with update=set parameter to the same value and
 also
  provided the values of multi-valued field as we do while adding new
  document.
  add
doc
  field name=id1/field
  *field name=name update=setDikchant/field*
  field name=skillssolr/field
  field name=skillslucene/field
/doc
  /add
 
  With this the index looks as follows:
  doc
  str name=id1/str
  str name=nameDikchant/str
  str name=professionsoftware engineer/str
  arr name=skills
  strsolr/str
  strlucene/str
  /arr
  /doc
 
  The values of multivalued field is changed and value of other field is
 not
  deleted.
 
  The question that comes to my mind is, does Solr 4.0 allows update of
  multi-valued field? if yes, is this how it works or am I doing something
  wrong?
 
  Regards,
  Dikchant
 



Re: Update / replication of offline indexes

2012-12-17 Thread Dikchant Sahi
Thanks Erick and Upayavira! This answers my question.


On Mon, Dec 17, 2012 at 8:05 AM, Erick Erickson erickerick...@gmail.comwrote:

 See the very last line here:
 http://wiki.apache.org/solr/MergingSolrIndexes

 Short answer is that merging will lead to duplicate documents, even with
 uniqueKeys defined.

 So you're really kind of stuck handling this outside of merge, either by
 shipping the
 list of overwritten docs and deleting them from the base index or shipping
 the JSON/XML
 format and indexing those. Of the  two, I'd think the latter is
 easiest/least prone to surprises.
 Especially since you could re-run the indexing as many times as necessary.

 The UniqueKey bits are only guaranteed to overwrite older docs when
 indexing, not merging.

 Best
 Erick


 On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi contacts...@gmail.com
 wrote:

  Hi Alex,
 
  You got my point right. What I see is merge adds duplicate document. Is
  there a way to overwrite existing document in one core by another. Can
  merge operation lead to data corruption, say in case when the core on
  client had uncommitted changes.
 
  What would be a better solution for my requirement, merge or indexing
  XML/JSON?
 
  Regards,
  Dikchant
 
  On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Not sure I fully understood this and maybe you already cover that by
   'merge', but if you know what you gave the client last time, you can
 just
   build a differential as a second core, then on client mount that second
   core and merge it into the first one (e.g. with DIH).
  
   Just a thought.
  
   Regards,
  Alex.
  
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
  
   On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
   wrote:
  
Hi Erick,
   
Sorry for creating the confusion. By slave, I mean the indexes on
  client
machine will be replica of the master and in not same as the slave in
master-slave model. Below is the detail:
   
The system is being developed to support search facility on 1000s of
system, a majority of which will be offline.
   
The idea is that we will have a search system which will be sold
on subscription basis. For each of the subscriber, we will copy the
   master
index to their local machine, over a drive or CD. Now, if a
 subscriber
comes after 2 months and want the updates, we just want to provide
 the
deltas for 2 month as the volume of data is huge. For this we can
 think
   of
two approaches:
1. Fetch the documents which are less than 2 months old  in JSON
 format
from master Solr. Copy it to the subscriber machine
and index those documents. (copy through cd / memory sticks)
2. Create separate indexes for each month on our master machine. Copy
  the
indexes to the client machine and merge. Prior to merge we need to
  delete
records which the new index has, to avoid duplicates.
   
As long as the setup is new, we will copy the complete index and
  restart
Solr. We are not sure of the best approach for copying the deltas.
   
Thanks,
Dikchant
   
   
   
On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 This is somewhat confusing. You say that box2 is the slave, yet
  they're
not
 connected? Then you need to copy the solr home/data index from
 box
  1
   to
 box 2 manually (I'd have box2 solr shut down at the time) and
 restart
Solr.

 Why can't the boxes be connected? That's a much simpler way of
 going
about
 it.

 Best
 Erick


 On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi 
  contacts...@gmail.com
 wrote:

  Hi Walter,
 
  Thanks for the response.
 
  Commit will help to reflect changes on Box1. We are able to
 achieve
this.
  We want the changes to reflect in Box2.
 
  We have two indexes. Say
  Box1: Master  DB has been setup. Data Import runs on this.
  Box2: Slave running.
 
  We want all the updates on Box1 to be merged/present in index on
   Box2.
 Both
  the boxes are not connected over n/w. How can be achieve this.
 
  Please let me know, if am not clear.
 
  Thanks again!
 
  Regards,
  Dikchant
 
  On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
 
   You do not need to manage online and offline indexes. Commit
 when
   you
 are
   done with your updates and Solr will take care of it for you.
 The
 changes
   are not live until you commit.
  
   wunder
  
   On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
  
Hi,
   
How can

Solr atomic update of multi-valued field

2012-12-17 Thread Dikchant Sahi
Hi,

Does Solr 4.0 allows to update the values of multi-valued field? Say I have
list of values for skills field like java, j2ee and i want to change it
to solr, lucene.

I was trying to play with atomic updates and below is my observation:

I have following document in my index:
doc
str name=id1/str
str name=nameDikchant/str
str name=professionsoftware engineer/str
arr name=skills
strjava/str
strj2ee/str
/arr
/doc

To update the skills to solr, lucene, I indexed document as follows:

*add*
*  doc*
*field name=id1/field*
*field name=skills update=setsolr/field*
*field name=skills update=setlucene/field*
*  /doc*
*/add*

The document added to index is as follows:
*doc*
*  str name=id1/str*
*  arr name=skills*
*str{set=solr}/str*
*str{set=lucene}/str*
*  /arr*
*/doc*

This is not what I was looking for. I found 2 issues:
1. The value of name field was lost
2. The skills fields had some junks like *{set=solr}*
*
*
*
*
Then, to achieve my goal, I tried something different. I tried setting some
single valued field with update=set parameter to the same value and also
provided the values of multi-valued field as we do while adding new
document.
add
  doc
field name=id1/field
*field name=name update=setDikchant/field*
field name=skillssolr/field
field name=skillslucene/field
  /doc
/add

With this the index looks as follows:
doc
str name=id1/str
str name=nameDikchant/str
str name=professionsoftware engineer/str
arr name=skills
strsolr/str
strlucene/str
/arr
/doc

The values of multivalued field is changed and value of other field is not
deleted.

The question that comes to my mind is, does Solr 4.0 allows update of
multi-valued field? if yes, is this how it works or am I doing something
wrong?

Regards,
Dikchant


Re: Update / replication of offline indexes

2012-12-13 Thread Dikchant Sahi
Hi Alex,

You got my point right. What I see is merge adds duplicate document. Is
there a way to overwrite existing document in one core by another. Can
merge operation lead to data corruption, say in case when the core on
client had uncommitted changes.

What would be a better solution for my requirement, merge or indexing
XML/JSON?

Regards,
Dikchant

On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Not sure I fully understood this and maybe you already cover that by
 'merge', but if you know what you gave the client last time, you can just
 build a differential as a second core, then on client mount that second
 core and merge it into the first one (e.g. with DIH).

 Just a thought.

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



 On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
 wrote:

  Hi Erick,
 
  Sorry for creating the confusion. By slave, I mean the indexes on client
  machine will be replica of the master and in not same as the slave in
  master-slave model. Below is the detail:
 
  The system is being developed to support search facility on 1000s of
  system, a majority of which will be offline.
 
  The idea is that we will have a search system which will be sold
  on subscription basis. For each of the subscriber, we will copy the
 master
  index to their local machine, over a drive or CD. Now, if a subscriber
  comes after 2 months and want the updates, we just want to provide the
  deltas for 2 month as the volume of data is huge. For this we can think
 of
  two approaches:
  1. Fetch the documents which are less than 2 months old  in JSON format
  from master Solr. Copy it to the subscriber machine
  and index those documents. (copy through cd / memory sticks)
  2. Create separate indexes for each month on our master machine. Copy the
  indexes to the client machine and merge. Prior to merge we need to delete
  records which the new index has, to avoid duplicates.
 
  As long as the setup is new, we will copy the complete index and restart
  Solr. We are not sure of the best approach for copying the deltas.
 
  Thanks,
  Dikchant
 
 
 
  On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   This is somewhat confusing. You say that box2 is the slave, yet they're
  not
   connected? Then you need to copy the solr home/data index from box 1
 to
   box 2 manually (I'd have box2 solr shut down at the time) and restart
  Solr.
  
   Why can't the boxes be connected? That's a much simpler way of going
  about
   it.
  
   Best
   Erick
  
  
   On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com
   wrote:
  
Hi Walter,
   
Thanks for the response.
   
Commit will help to reflect changes on Box1. We are able to achieve
  this.
We want the changes to reflect in Box2.
   
We have two indexes. Say
Box1: Master  DB has been setup. Data Import runs on this.
Box2: Slave running.
   
We want all the updates on Box1 to be merged/present in index on
 Box2.
   Both
the boxes are not connected over n/w. How can be achieve this.
   
Please let me know, if am not clear.
   
Thanks again!
   
Regards,
Dikchant
   
On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
   wun...@wunderwood.org
wrote:
   
 You do not need to manage online and offline indexes. Commit when
 you
   are
 done with your updates and Solr will take care of it for you. The
   changes
 are not live until you commit.

 wunder

 On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:

  Hi,
 
  How can we do delta update of offline indexes?
 
  We have the master index on which data import will be done. The
  index
  directory will be copied to slave machine in case of full update,
through
  CD as the  slave/client machine is offline.
  So, what should be the approach for getting the delta to the
  slave. I
can
  think of two approaches.
 
  1.Create separate indexes of the delta on the master machine,
 copy
  it
to
  the slave machine and merge. Before merging the indexes on the
  client
  machine, delete all the updated and deleted documents in client
   machine
  else merge will add duplicates. So along with the index, we need
 to
  transfer the list of documents which has been updated/deleted.
 
  2. Extract all the documents which has changed since a particular
   time
in
  XML/JSON and index it in client machine.
 
  The size of indexes are huge, so we cannot rollover index
  everytime.
 
  Please help me with your take and challenges you see in the above
  approaches. Please suggest if you think of any other better

Re: Update / replication of offline indexes

2012-12-13 Thread Dikchant Sahi
Yes, we have an uniqueId defined but merge adds two documents with the same
id. As per my understanding this is how Solr behaves. Correct me if am
wrong.

On Fri, Dec 14, 2012 at 2:25 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Do you have IDs defined? How do you expect Sold to know they are duplicate
 records? Maybe the issue is there somewhere.

 Regards,
  Alex
 On 13 Dec 2012 15:17, Dikchant Sahi contacts...@gmail.com wrote:

  Hi Alex,
 
  You got my point right. What I see is merge adds duplicate document. Is
  there a way to overwrite existing document in one core by another. Can
  merge operation lead to data corruption, say in case when the core on
  client had uncommitted changes.
 
  What would be a better solution for my requirement, merge or indexing
  XML/JSON?
 
  Regards,
  Dikchant
 
  On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Not sure I fully understood this and maybe you already cover that by
   'merge', but if you know what you gave the client last time, you can
 just
   build a differential as a second core, then on client mount that second
   core and merge it into the first one (e.g. with DIH).
  
   Just a thought.
  
   Regards,
  Alex.
  
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
  
   On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
   wrote:
  
Hi Erick,
   
Sorry for creating the confusion. By slave, I mean the indexes on
  client
machine will be replica of the master and in not same as the slave in
master-slave model. Below is the detail:
   
The system is being developed to support search facility on 1000s of
system, a majority of which will be offline.
   
The idea is that we will have a search system which will be sold
on subscription basis. For each of the subscriber, we will copy the
   master
index to their local machine, over a drive or CD. Now, if a
 subscriber
comes after 2 months and want the updates, we just want to provide
 the
deltas for 2 month as the volume of data is huge. For this we can
 think
   of
two approaches:
1. Fetch the documents which are less than 2 months old  in JSON
 format
from master Solr. Copy it to the subscriber machine
and index those documents. (copy through cd / memory sticks)
2. Create separate indexes for each month on our master machine. Copy
  the
indexes to the client machine and merge. Prior to merge we need to
  delete
records which the new index has, to avoid duplicates.
   
As long as the setup is new, we will copy the complete index and
  restart
Solr. We are not sure of the best approach for copying the deltas.
   
Thanks,
Dikchant
   
   
   
On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 This is somewhat confusing. You say that box2 is the slave, yet
  they're
not
 connected? Then you need to copy the solr home/data index from
 box
  1
   to
 box 2 manually (I'd have box2 solr shut down at the time) and
 restart
Solr.

 Why can't the boxes be connected? That's a much simpler way of
 going
about
 it.

 Best
 Erick


 On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi 
  contacts...@gmail.com
 wrote:

  Hi Walter,
 
  Thanks for the response.
 
  Commit will help to reflect changes on Box1. We are able to
 achieve
this.
  We want the changes to reflect in Box2.
 
  We have two indexes. Say
  Box1: Master  DB has been setup. Data Import runs on this.
  Box2: Slave running.
 
  We want all the updates on Box1 to be merged/present in index on
   Box2.
 Both
  the boxes are not connected over n/w. How can be achieve this.
 
  Please let me know, if am not clear.
 
  Thanks again!
 
  Regards,
  Dikchant
 
  On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
 
   You do not need to manage online and offline indexes. Commit
 when
   you
 are
   done with your updates and Solr will take care of it for you.
 The
 changes
   are not live until you commit.
  
   wunder
  
   On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
  
Hi,
   
How can we do delta update of offline indexes?
   
We have the master index on which data import will be done.
 The
index
directory will be copied to slave machine in case of full
  update,
  through
CD as the  slave/client machine is offline.
So, what should be the approach for getting the delta to the
slave. I
  can
think of two approaches

Re: Update / replication of offline indexes

2012-12-12 Thread Dikchant Sahi
Hi Erick,

Sorry for creating the confusion. By slave, I mean the indexes on client
machine will be replica of the master and in not same as the slave in
master-slave model. Below is the detail:

The system is being developed to support search facility on 1000s of
system, a majority of which will be offline.

The idea is that we will have a search system which will be sold
on subscription basis. For each of the subscriber, we will copy the master
index to their local machine, over a drive or CD. Now, if a subscriber
comes after 2 months and want the updates, we just want to provide the
deltas for 2 month as the volume of data is huge. For this we can think of
two approaches:
1. Fetch the documents which are less than 2 months old  in JSON format
from master Solr. Copy it to the subscriber machine
and index those documents. (copy through cd / memory sticks)
2. Create separate indexes for each month on our master machine. Copy the
indexes to the client machine and merge. Prior to merge we need to delete
records which the new index has, to avoid duplicates.

As long as the setup is new, we will copy the complete index and restart
Solr. We are not sure of the best approach for copying the deltas.

Thanks,
Dikchant



On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.comwrote:

 This is somewhat confusing. You say that box2 is the slave, yet they're not
 connected? Then you need to copy the solr home/data index from box 1 to
 box 2 manually (I'd have box2 solr shut down at the time) and restart Solr.

 Why can't the boxes be connected? That's a much simpler way of going about
 it.

 Best
 Erick


 On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com
 wrote:

  Hi Walter,
 
  Thanks for the response.
 
  Commit will help to reflect changes on Box1. We are able to achieve this.
  We want the changes to reflect in Box2.
 
  We have two indexes. Say
  Box1: Master  DB has been setup. Data Import runs on this.
  Box2: Slave running.
 
  We want all the updates on Box1 to be merged/present in index on Box2.
 Both
  the boxes are not connected over n/w. How can be achieve this.
 
  Please let me know, if am not clear.
 
  Thanks again!
 
  Regards,
  Dikchant
 
  On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
 
   You do not need to manage online and offline indexes. Commit when you
 are
   done with your updates and Solr will take care of it for you. The
 changes
   are not live until you commit.
  
   wunder
  
   On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
  
Hi,
   
How can we do delta update of offline indexes?
   
We have the master index on which data import will be done. The index
directory will be copied to slave machine in case of full update,
  through
CD as the  slave/client machine is offline.
So, what should be the approach for getting the delta to the slave. I
  can
think of two approaches.
   
1.Create separate indexes of the delta on the master machine, copy it
  to
the slave machine and merge. Before merging the indexes on the client
machine, delete all the updated and deleted documents in client
 machine
else merge will add duplicates. So along with the index, we need to
transfer the list of documents which has been updated/deleted.
   
2. Extract all the documents which has changed since a particular
 time
  in
XML/JSON and index it in client machine.
   
The size of indexes are huge, so we cannot rollover index everytime.
   
Please help me with your take and challenges you see in the above
approaches. Please suggest if you think of any other better approach.
   
Thanks a ton!
   
Regards,
Dikchant
  
   --
   Walter Underwood
   wun...@wunderwood.org
  
  
  
  
 



Re: Update multiple documents

2012-12-11 Thread Dikchant Sahi
My intention is to allow search on person names in the second index also.
If we use personId in the second index, is there a way to achieve that?

Yes, we are looking for join kind of feature.

Thanks!

On Wed, Dec 12, 2012 at 8:31 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 But is that the best approach?  If you use personIds in your second index
 then you don't have to did that. Maybe you are after joins in Solr?

 Otis
 --
 SOLR Performance Monitoring - http://sematext.com/spm
 On Dec 11, 2012 1:21 PM, Dikchant Sahi contacts...@gmail.com wrote:

  Hi,
 
  We have two set of related indexes.
 
  Index1: person(personId, person_name, field2, field3)
  Index2: mapping (id, fieldx, fieldy, person)
 
  When ever any person name changes, we need to update both the indexes.
 For
  person field, we can update the person name as we have personId which is
  the uniqueKey. How can we update the person names in index2.
 
  Eg:
  Index1: person(001, Micheal Jackson, value1, value2)
 
  Index2: mapping(1234, Thriller, Micheal Jackson)
  (1235, Billy Jean, Micheal Jackson)
 
  *Micheal* Jackson changes to *Michael* Jackson
 
  What would be the best approach solution to this problem.
 
  Thanks,
  Dikchant
 



Re: Update / replication of offline indexes

2012-12-10 Thread Dikchant Sahi
Hi Walter,

Thanks for the response.

Commit will help to reflect changes on Box1. We are able to achieve this.
We want the changes to reflect in Box2.

We have two indexes. Say
Box1: Master  DB has been setup. Data Import runs on this.
Box2: Slave running.

We want all the updates on Box1 to be merged/present in index on Box2. Both
the boxes are not connected over n/w. How can be achieve this.

Please let me know, if am not clear.

Thanks again!

Regards,
Dikchant

On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.orgwrote:

 You do not need to manage online and offline indexes. Commit when you are
 done with your updates and Solr will take care of it for you. The changes
 are not live until you commit.

 wunder

 On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:

  Hi,
 
  How can we do delta update of offline indexes?
 
  We have the master index on which data import will be done. The index
  directory will be copied to slave machine in case of full update, through
  CD as the  slave/client machine is offline.
  So, what should be the approach for getting the delta to the slave. I can
  think of two approaches.
 
  1.Create separate indexes of the delta on the master machine, copy it to
  the slave machine and merge. Before merging the indexes on the client
  machine, delete all the updated and deleted documents in client machine
  else merge will add duplicates. So along with the index, we need to
  transfer the list of documents which has been updated/deleted.
 
  2. Extract all the documents which has changed since a particular time in
  XML/JSON and index it in client machine.
 
  The size of indexes are huge, so we cannot rollover index everytime.
 
  Please help me with your take and challenges you see in the above
  approaches. Please suggest if you think of any other better approach.
 
  Thanks a ton!
 
  Regards,
  Dikchant

 --
 Walter Underwood
 wun...@wunderwood.org






Re: multiple indexes?

2012-11-30 Thread Dikchant Sahi
Multiple indexes can be setup using the multi core feature of Solr.

Below are the steps:
1. Add the core name and storage location of the core to
the $SOLR_HOME/solr.xml file.
  cores adminPath=/admin/cores defaultCoreName=core-name1 
*core name=core-name1 instanceDir=core-dir1 /*
*core name=core-name2 instanceDir=core-dir2 /*
  /cores

2. Create the core-directories specified and following sub-directories in
it:
- conf: Contains the configs and schema definition
- lib: Contains the required libraries
- data: Will be created automatically on first run. This would contain
the actual index.

While indexing the docs, you specify the core name in the url as follows:
  http://host:port/solr/core-name/update?parameters

Similarly you do while querying.

Please refer to Solr Wiki, it has the complete details.

Hope this helps!

- Dikchant

On Sat, Dec 1, 2012 at 10:41 AM, Joe Zhang smartag...@gmail.com wrote:

 May I ask: how to set up multiple indexes, and specify which index to send
 the docs to at indexing time, and later on, how to specify which index to
 work with?

 A related question: what is the storage location and structure of solr
 indexes?

 Thanks in advance, guys!

 Joe.



Re: solr issue with seaching words

2012-09-04 Thread Dikchant Sahi
Try debugging it using analysis page or running the query in debug mode
(debugQuery=true).

In analysis page, add 'RCA-Jack/' to index and 'jacke' to query. This might
help you understanding the behavior.

If still unable to debug, some additional information would be required to
help.

On Tue, Sep 4, 2012 at 3:38 PM, zainu zainu...@gmail.com wrote:

 I am facing a strange problem. I am searching for word jacke but solr
 also
 returns result where my description contains 'RCA-Jack/'. Íf i search
 jacka or jackc or jackd, it works fine and does not return me any
 result which is what i am expecting in this case.

 Only when there is jacke, it return me result with RCA-Jack/. So there
 seems some kind of relationshio between e and / and it considers e as
 /.

 Any help?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-issue-with-seaching-words-tp4005200.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search results not returned for a str field

2012-07-20 Thread Dikchant Sahi
DefaultSearchField is the field which is queried if you don't explicitly
specify the fields to query on.

Please refer to the below link:
http://wiki.apache.org/solr/SchemaXml

On Sat, Jul 21, 2012 at 12:56 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Hello, Lakshmi,

 The issue is the fieldType you've assigned to the fields in your
 schema does not perform any analysis on the string before indexing it.
 So it will only do exact matches. If you want to do matches against
 portions of the field value, use one of the text types that come in
 the default schema.

 Michael Della Bitta

 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com


 On Fri, Jul 20, 2012 at 3:18 PM, Lakshmi Bhargavi
 lakshmi.bharg...@gmail.com wrote:
  Hi ,
 
  I have the following configuration
  ?xml version=1.0 ?
 
 
  schema name=example core zero version=1.1
types
 fieldtype name=string  class=solr.StrField sortMissingLast=true
  omitNorms=true/
/types
 
   fields
 
field name=id  type=string   indexed=true  stored=true
  multiValued=false required=true/
field name=typetype=string   indexed=true  stored=true
  multiValued=false /
field name=nametype=string   indexed=true  stored=true
  multiValued=false /
field name=core0   type=string   indexed=true  stored=true
  multiValued=false /
   /fields
 
 
   uniqueKeyid/uniqueKey
 
 
   defaultSearchFieldname/defaultSearchField
 
 
   solrQueryParser defaultOperator=OR/
  /schema
 
  I am also attaching the solr config file
  http://lucene.472066.n3.nabble.com/file/n3996313/solrconfig.xml
  solrconfig.xml
 
  I indexed a document
 
  adddoc
field name=idMA147LL/A/field
field name=nameApple 60 GB iPod with Video Playback Black/field
 
  /doc/add
 
  When I do a wildcard search , the results are returned
  http://localhost:8983/solr/select?q=*:*
 
?xml version=1.0 encoding=UTF-8 ?
  - response
  - lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
  - result name=response numFound=1 start=0
  - doc
str name=idMA147LL/A/str
str name=nameApple 60 GB iPod with Video Playback Black/str
/doc
/result
/response
 
 
  but the results are not returned for specific query
  http://localhost:8983/solr/core0/select?q=iPod
 
  ?xml version=1.0 encoding=UTF-8 ?
  - response
  - lst name=responseHeader
int name=status0/int
int name=QTime5/int
/lst
result name=response numFound=0 start=0 /
/response
 
  Could some one please let me know what is wrong? Also would be very
 helpful
  if some one can explain the significance of the defaultsearch field.
 
  Thanks,
  lakshmi
 
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Search-results-not-returned-for-a-str-field-tp3996313.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Re: NGram for misspelt words

2012-07-18 Thread Dikchant Sahi
You are creating grams only while indexing and not querying hence 'ludlwo'
would not match. Your analyzer will create the following grams while
indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match
to 'ludlwo'.

Either you need to create gram while querying also or use Edit Distance.

On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote:




 I have configured NGram Indexing for some fields.

 Say I search for the city Ludlow, I get the results (normal search)

 If I search for Ludlo (with w ommitted) I get the results

 If I search for Ludl (with ow ommitted) I still get the results

 I know that they are all partial strings of the main string hence NGram
 works perfect.

 But when I type in Ludlwo (misspelt, characters o and w interchanged) I
 dont get any results, It should ideally match Ludl and provide the
 results.

 I am not looking for Edit distance based Spell Correctors. How can I make
 above NGram based search work?

 Here is my schema.xml (NGramFieldType):

 fieldType name=nGram class=solr.TextField positionIncrementGap=100
 stored=false multiValued=true

 analyzer type=index

 tokenizer class=solr.StandardTokenizerFactory/

 !-- potentially word delimiter, synonym filter, stop words, NOT stemming
 --

 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=front /



 /analyzer

 analyzer type=query

 tokenizer class=solr.StandardTokenizerFactory/

 !-- potentially word delimiter, synonym filter, stop words, NOT stemming
 --

 filter class=solr.LowerCaseFilterFactory/

 /analyzer

 /fieldType


 /PRE
 BR
 **BRThis
 message may contain confidential or proprietary information intended only
 for the use of theBRaddressee(s) named above or may contain information
 that is legally privileged. If you areBRnot the intended addressee, or
 the person responsible for delivering it to the intended addressee,BRyou
 are hereby notified that reading, disseminating, distributing or copying
 this message is strictlyBRprohibited. If you have received this message
 by mistake, please immediately notify us byBRreplying to the message and
 delete the original message and any copies immediately thereafter.BR
 BR
 Thank you.~BR

 **BR
 FAFLDBR
 PRE



Re: NGram for misspelt words

2012-07-18 Thread Dikchant Sahi
Have you tried the analysis window to debug.

I believe you are doing something wrong in the fieldType.

On Wed, Jul 18, 2012 at 8:07 PM, Husain, Yavar yhus...@firstam.com wrote:

 Thanks Sahi. I have replaced my EdgeNGramFilterFactory to
 NGramFilterFactory as I need substrings not just in front or back but
 anywhere.
 You are right I put the same NGramFilterFactory in both Query and Index
 however now it does not return any results not even the basic one.

 -Original Message-
 From: Dikchant Sahi [mailto:contacts...@gmail.com]
 Sent: Wednesday, July 18, 2012 7:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: NGram for misspelt words

 You are creating grams only while indexing and not querying hence 'ludlwo'
 would not match. Your analyzer will create the following grams while
 indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match
 to 'ludlwo'.

 Either you need to create gram while querying also or use Edit Distance.

 On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com
 wrote:

 
 
 
  I have configured NGram Indexing for some fields.
 
  Say I search for the city Ludlow, I get the results (normal search)
 
  If I search for Ludlo (with w ommitted) I get the results
 
  If I search for Ludl (with ow ommitted) I still get the results
 
  I know that they are all partial strings of the main string hence
  NGram works perfect.
 
  But when I type in Ludlwo (misspelt, characters o and w interchanged)
  I dont get any results, It should ideally match Ludl and provide the
  results.
 
  I am not looking for Edit distance based Spell Correctors. How can I
  make above NGram based search work?
 
  Here is my schema.xml (NGramFieldType):
 
  fieldType name=nGram class=solr.TextField positionIncrementGap=100
  stored=false multiValued=true
 
  analyzer type=index
 
  tokenizer class=solr.StandardTokenizerFactory/
 
  !-- potentially word delimiter, synonym filter, stop words, NOT
  stemming
  --
 
  filter class=solr.LowerCaseFilterFactory/
 
  filter class=solr.EdgeNGramFilterFactory minGramSize=2
  maxGramSize=15 side=front /
 
 
 
  /analyzer
 
  analyzer type=query
 
  tokenizer class=solr.StandardTokenizerFactory/
 
  !-- potentially word delimiter, synonym filter, stop words, NOT
  stemming
  --
 
  filter class=solr.LowerCaseFilterFactory/
 
  /analyzer
 
  /fieldType
 
 
  /PRE
  BR
  **
  BRThis message may contain confidential or
  proprietary information intended only for the use of
  theBRaddressee(s) named above or may contain information that is
  legally privileged. If you areBRnot the intended addressee, or the
  person responsible for delivering it to the intended addressee,BRyou
  are hereby notified that reading, disseminating, distributing or
  copying this message is strictlyBRprohibited. If you have received
  this message by mistake, please immediately notify us byBRreplying
  to the message and delete the original message and any copies
  immediately thereafter.BR BR Thank you.~BR
 
  **
  BR
  FAFLDBR
  PRE
 



Re: Big Data Analysis and Management - 2 day Workshop

2012-05-23 Thread Dikchant Sahi
Hi Manish,

The attachment seems to be missing. Would you mind sharing the same.

Am a Search Engineer based in Bangalore. Would me interested in attending
the workshop.

Best Regards,
Dikchant Sahi

On Thu, May 24, 2012 at 10:22 AM, Manish Bafna manish.bafna...@gmail.comwrote:

 Dear Friend,
 We are organizing a workshop on Big Data. Here are details regarding the
 same.
 Please forward it to your company HR and also your friends and let me know
 if anyone is interested. We have early bird offer if registration is done
 before 31st May 2012.


 Big Data is one space that is buzzing in the market big time. There are
 several applications of various technologies involved around Big Data. Many
 a times when we work as part of various project or product development we
 all will be streamlining our time and energy towards its successful
 delivery. To ensure your colleagues don't miss out on this hot topic and to
 stay abreast with these niche things, we thought we will share our
 expertise with Senior Developers and Architects through this workshop on *Big
 Data Analysis and Management* that we have scheduled on *Bangalore on
 June 16th and 17th.*
  **
 We will be covering various topics under the following 4 broad headlines.
 You can check the attached outline for a detailed insight into what we will
 cover under each head. It is definitely going to be an intensive and
 relevant hands-on session along with vivid explanation of concepts and
 theories around it. On a lighter note, there is definitely going to be lot
 of jargons flowing around all participants in this short span of two days.
 *
 *
 *Content Extraction (hands-on using Apache Tika)*
 *Distribute Content in NOSQL ways (hands-on using Cassandra, Neo4j)* *Search
 and Indexing (hands-on using Solr and Tika)* *Distributed computing and
 analysis using Hadoop MapReduce and Mahout (hands-on using Hadoop
 MapReduce, Mahout)*
 To register for this workshop, kindly send a mail to me along with the
 details of the participants (along with their profile will be better) and
 payment details.

 I am enclosing herewith the complete course details attached along with
 this mail. I along with two of my peers will be delivering this workshop.
 You can find our brief profile mentioned in the attached content.

 Feel free to contact me any time for any queries

  With best regards,
 Manish.