Index mysql database using data import handler in solr

2013-07-11 Thread archit2112
I want to index mysql database in solr using the Data Import Handler.

I have made two tables. The first table holds the metadata of a file.

create table filemetadata (
id varchar(20) primary key ,
filename varchar(50),
path varchar(200),
size varchar(10),
author varchar(50)
) ;

+---+-+-+--+-+
| id   | filename   | path  | size   | author   | 
+---+-+-+--+-+
| 1| abc.txt| c:\files   | 2kb   | eric   | 
+---+-+-+--+-+
| 2| xyz.docx  | c:\files   | 5kb   | john  | 
+---+-+-+--+-+
| 3| pqr.txt|c:\files| 10kb  | mike  | 
+---+-+-+--+-+

The second table contains the favourite info about a particular file in
the above table.

create table filefav (
fid varchar(20) primary key ,
id varchar(20),
favouritedby varchar(300),
favouritedtime varchar(10),
FOREIGN KEY (id) REFERENCES filemetadata(id) 
) ;

++--+-++
| fid| id  | favouritedby  | favouritedtime   | 
++--+-++
| 1 | 1   | ross | 22:30   | 
++--+-++
| 2 | 1   | josh | 12:56   | 
++--+-++
| 3 | 2   | johny   | 03:03   | 
++--+-++
| 4 | 2   | sean | 03:45  | 
++--+-++

here id' is a foreign key. The second table is showing which person has
marked which document as his/her favourite. Eg the file abc.txt represented
by id = 1 has been marked favourite (see column favouritedby) by ross and
josh.


I want to index the the files as follows:

each document should have the following fields

id   - to be taken from the first table filemetadata
filename - to be taken from the first table filemetadata
path - to be taken from the first table filemetadata
size - to be taken from the first table filemetadata
author   - to be taken from the first table filemetadata
Favouritedby - this field should contain the names of all the people
from table 2 filefav (from the favouritedby column) who like that particular
file.

eg after indexing doc 1 should have

id = 1
filename = abc.txt
path = c:\files
size = 2kb
author = eric
favourited by - ross , josh 

How Do I achieve this? 

I have written a data-config.xml (which is not giving the desired result) as
follows

dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/test user=root password=root / 
document name=filemetadata

entity name=restaurant query=select * from filemetadata
field column=id name=id / 

 entity name=filefav query=select favouritedby from filefav where
id=${filemetadata.id}
field column=favouritedby name=favouritedby1 /
/entity

field column=filename name=name1 / 
field column=path name=path1 / 
field column=size name=size1 / 
field column=author name=author1 /  

/entity
/document
/dataConfig

Can anyone explain how do i achieve this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-mysql-database-using-data-import-handler-in-solr-tp4077205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing database in Solr using Data Import Handler

2013-07-11 Thread Gora Mohanty
On 11 July 2013 11:13, archit2112 archit2...@gmail.com wrote:


 Im trying to index MySql database using Data Import Handler in solr.
[...]
 Everything is working but the favouritedby1 field is not getting indexed
 ,
 ie, that field does not exist when i run the *:* query. Can you please
 help
 me out?

Please show us your schema.xml. Does it have
a favouritedby1 field, and the other fields that
you are trying to add through DIH?

Regards,
Gora


Performance of cross join vs block join

2013-07-11 Thread mihaela olteanu
Hello,

Does anyone know about some measurements in terms of performance for cross 
joins compared to joins inside a single index?

Is it faster the join inside a single index that stores all documents of 
various types (from parent table or from children tables)with a discriminator 
field compared to the cross join (basically in this case each document type 
resides in its own index)?

I have performed some tests but to me it seems that having a join in a single 
index (bigger index) does not add too much speed improvements compared to cross 
joins. 

Why a block join would be faster than a cross join if this is the case? What 
are the variables that count when trying to improve the query execution time?

Thanks!
Mihaela

Re: Performance of cross join vs block join

2013-07-11 Thread Mikhail Khludnev
Mihaela,

For me it's reasonable that single core join takes the same time as cross
core one. I just can't see which gain can be obtained from in the former
case.
I hardly able to comment join code, I looked into, it's not trivial, at
least. With block join it doesn't need to obtain parentId term
values/numbers and lookup parents by them. Both of these actions are
expensive. Also blockjoin works as an iterator, but join need to allocate
memory for parents bitset and populate it out of order that impacts
scalability.
Also in None scoring mode BJQ don't need to walk through all children, but
only hits first. Also, nice feature is 'both side leapfrog' if you have a
highly restrictive filter/query intersects with BJQ, it allows to skip many
parents and children as well, that's not possible in Join, which has fairly
'full-scan' nature.
Main performance factor for Join is number of child docs.
I'm not sure I got all your questions, please specify them in more details,
if something is still unclear.
have you saw my benchmark
http://blog.griddynamics.com/2012/08/block-join-query-performs.html ?



On Thu, Jul 11, 2013 at 1:52 PM, mihaela olteanu mihaela...@yahoo.comwrote:

 Hello,

 Does anyone know about some measurements in terms of performance for cross
 joins compared to joins inside a single index?

 Is it faster the join inside a single index that stores all documents of
 various types (from parent table or from children tables)with a
 discriminator field compared to the cross join (basically in this case each
 document type resides in its own index)?

 I have performed some tests but to me it seems that having a join in a
 single index (bigger index) does not add too much speed improvements
 compared to cross joins.

 Why a block join would be faster than a cross join if this is the case?
 What are the variables that count when trying to improve the query
 execution time?

 Thanks!
 Mihaela




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


request to be added as a wiki contributor

2013-07-11 Thread Andrew MacKinlay
Hi,

My wiki username is AndyMacKinlay. Can I please be added to the 
ContributorsGroup?

Thanks,
Andy

Term component regex to remove stopwords

2013-07-11 Thread shruti suri
Hi,

Can Termcomponent parameter terms.regex be used to ignore stop words.

Regards
Shruti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Term-component-regex-to-remove-stopwords-tp4077196.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem using Term Component in solr

2013-07-11 Thread Parul Gupta(Knimbus)
Hi All

I am using *Term component* in solr for searching titles with short form
using wild card characters(.*) and [a-z0-9]*.

I am using *Term Component* specifically as wild card characters are not
working on *select?q=* query search.

Examples of some *title *are:

1)Medicine, Health Care and Philosophy
2)Medical Physics
3)Physics of fluids
4)Medical Engineering and Physics

***When i do *solr query*:
localhost:8080/solr3.6/OA/terms?terms.fl=titleterms.regex=phy.*
fluidsterms.regex.flag=case_insensitiveterms.limit=10

*Output* is 3rd title:
*Physics of fluids*

This is relevant output.

***But when i do *solr query*:

localhost:8080/solr3.6/OA/terms?terms.fl=titleterms.regex=med.*
phy.*terms.regex.flag=case_insensitiveterms.limit=10

*Output* are 2nd and 4th title:

*Medical Engineering and Physics*
*Medical Physics*

This is irrelevant.I want only one result for this query i.e. *Medical
Physics*

*Although i have changed my wild card
characters to *[a-z0-9]** instead of *.** ,but than first query doesn't work
as '*of*' is included in '*Physics of fluids*'.However Second query works
fine .

example of query is:

localhost:8080/solr3.6/OA/terms?terms.fl=titleterms.regex=med[a-z0-9]*
phy[a-z0-9]*terms.regex.flag=case_insensitiveterms.limit=10

This works fine,gives one output *Medical Physics*.


If there is another way for searching using *Term Component* or without
using it..Please suggest to neglect such stop words.

Note:Term Component works only on string dataType field.  :(




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-using-Term-Component-in-solr-tp4077200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to make 'fq' optional?

2013-07-11 Thread Mikhail Khludnev
https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/search/SwitchQParserPlugin.html

Hoss cares about you!


On Wed, Jul 10, 2013 at 10:40 PM, Learner bbar...@gmail.com wrote:

 I am trying to make a variable in fq optional,

 Ex:

 /select?first_name=peterfq=$first_nameq=*:*

 I don't want the above query to throw error or die whenever the variable
 first_name is not passed to the query instead return the value
 corresponding
 to rest of the query. I can use switch but its difficult to handle each and
 every case using switch (as I need to handle switch for so many
 variables)... Is there a way to resolve this via some other way?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-make-fq-optional-tp4077042.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Performance of cross join vs block join

2013-07-11 Thread mihaela olteanu
In my current use case I have 4 tables with a one to many relationship between 
them (one is the parent and the rest are the children ) and I have created for 
each table a separate Solr core.
Now I have the request to return all those parents that match a certain 
criteria or one of its children match the same criteria or a different criteria.
Given the fact that moving all these documents in a single core implies more 
changes in the current code than keeping the cores as they are I considered 
also the solution with union of cross joins.
Next I performed some tests and saw that having join in a single core does not 
add too much compared to union of cross join, hence I don't know which solution 
to adopt.
Do you see a use case where I would hit the wall if I keep the documents in 
separate cores? 

BTW the link bellow does not work (I have found it while searching this topic) 
, it displays an empty page.

Thanks,
Mihaela




 From: Mikhail Khludnev mkhlud...@griddynamics.com
To: solr-user solr-user@lucene.apache.org; mihaela olteanu 
mihaela...@yahoo.com 
Sent: Thursday, July 11, 2013 2:25 PM
Subject: Re: Performance of cross join vs block join
 

Mihaela,

For me it's reasonable that single core join takes the same time as cross
core one. I just can't see which gain can be obtained from in the former
case.
I hardly able to comment join code, I looked into, it's not trivial, at
least. With block join it doesn't need to obtain parentId term
values/numbers and lookup parents by them. Both of these actions are
expensive. Also blockjoin works as an iterator, but join need to allocate
memory for parents bitset and populate it out of order that impacts
scalability.
Also in None scoring mode BJQ don't need to walk through all children, but
only hits first. Also, nice feature is 'both side leapfrog' if you have a
highly restrictive filter/query intersects with BJQ, it allows to skip many
parents and children as well, that's not possible in Join, which has fairly
'full-scan' nature.
Main performance factor for Join is number of child docs.
I'm not sure I got all your questions, please specify them in more details,
if something is still unclear.
have you saw my benchmark
http://blog.griddynamics.com/2012/08/block-join-query-performs.html ?



On Thu, Jul 11, 2013 at 1:52 PM, mihaela olteanu mihaela...@yahoo.comwrote:

 Hello,

 Does anyone know about some measurements in terms of performance for cross
 joins compared to joins inside a single index?

 Is it faster the join inside a single index that stores all documents of
 various types (from parent table or from children tables)with a
 discriminator field compared to the cross join (basically in this case each
 document type resides in its own index)?

 I have performed some tests but to me it seems that having a join in a
 single index (bigger index) does not add too much speed improvements
 compared to cross joins.

 Why a block join would be faster than a cross join if this is the case?
 What are the variables that count when trying to improve the query
 execution time?

 Thanks!
 Mihaela




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Re: amount of values in a multi value field - is denormalization always the best option?

2013-07-11 Thread Flavio Pompermaier
I also have a similar scenario, where fundamentally I have to retrieve all
urls where a userid has been found.
So, in my schema, I designed the url as (string) key and a (possible huge)
list of attributes automatically mapped to strings.
For example:

Url1 (key):
 - language: en
 - content:userid1
 - content:userid1
 - content:userid1 (i.e. 3 times actually for user 1)
 - content:userid2
 - content:userid3
 - author:userid4

and so on and so forth.
So, if I did understand, you're saying that this is a bad design? How
should I fix my schema in your opinion in that case?

Best,
Flavio


On Wed, Jul 10, 2013 at 11:53 PM, Jack Krupansky j...@basetechnology.comwrote:

 Simple answer: avoid large number of values in a single document. There
 should only be a modest to moderate number of fields in a single document.

 Is the data relatively static, or subject to frequent updates? To update
 any field of a single document, even with atomic update, requires Solr to
 read and rewrite every field of the document. So, lots of smaller documents
 are best for a frequent update scenario.

 Multivalues fields are great for storing a relatively small list of
 values. You can add to the list easily, but under the hood, Solr must read
 and rewrite the full list as well as the full document. And, there is no
 way to address or synchronize individual elements of multivalued fields.

 Joins are great... if used in moderation. Heavy use of joins is not a
 great idea.

 -- Jack Krupansky

 -Original Message- From: Marcelo Elias Del Valle
 Sent: Wednesday, July 10, 2013 5:37 PM
 To: solr-user@lucene.apache.org
 Subject: amount of values in a multi value field - is denormalization
 always the best option?


 Hello,

I have asked a question recently about solr limitations and some about
 joins. It comes that this question is about both at the same time.
I am trying to figure how to denormalize my data so I will need just 1
 document in my index instead of performing a join. I figure one way of
 doing this is storing an entity as a multivalued field, instead of storing
 different fields.
Let me give an example. Consider the entities:

 User:
id: 1
type: Joan of Arc
age: 27

 Webpage:
id: 1
url: http://wiki.apache.org/solr/**Joinhttp://wiki.apache.org/solr/Join
category: Technical
user_id: 1

id: 2
url: http://stackoverflow.com
category: Technical
user_id: 1

Instead of creating 1 document for user, 1 for webpage 1 and 1 for
 webpage 2 (1 parent and 2 childs) I could store webpages in a user
 multivalued field, as follows:

 User:
id: 1
name: Joan of Arc
age: 27
webpage1: [id:1, url: 
 http://wiki.apache.org/solr/**Joinhttp://wiki.apache.org/solr/Join,
 category:
 Technical]
webpage2: [id:2, url: http://stackoverflow.com;, category:
 Technical]

It would probably perform better than the join, right? However, it made
 me think about solr limitations again. What if I have 200 million webpges
 (200 million fields) per user? Or imagine a case where I could have 200
 million values on a field, like in the case I need to index every html DOM
 element (div, a, etc.) for each web page user visited.
I mean, if I need to do the query and this is a business requirement no
 matter what, although denormalizing could be better than using query time
 joins, I wonder it distributing the data present in this single document
 along the cluster wouldn't give me better performance. And this is
 something I won't get with block joins or multivalued fields...
I guess there is probably no right answer for this question (at least
 not a known one), and I know I should create a POC to check how each
 perform... But do you think a so large number of values in a single
 document could make denormalization not possible in an extreme case like
 this? Would you share my thoughts if I said denormalization is not always
 the right option?

 Best regards,
 --
 Marcelo Elias Del Valle
 http://mvalle.com - @mvallebr



Re: request to be added as a wiki contributor

2013-07-11 Thread Erick Erickson
Done.

On Wed, Jul 10, 2013 at 10:25 PM, Andrew MacKinlay admac...@gmail.com wrote:
 Hi,

 My wiki username is AndyMacKinlay. Can I please be added to the 
 ContributorsGroup?

 Thanks,
 Andy


Applying Sum on Field

2013-07-11 Thread Jamshaid Ashraf
Hi,

I'm a new solr user, I wanted to know is there any way to apply sum on a
field in a result document of group query?

Following is the query and its result set, I wanted to apply sum on 'price'
filed grouping on type:


*Sample input:*

doc
str name=id3/str
str name=typeCaffe/str
str name=contentYummm  Drinking a latte at Caffe Grecco in SF shistoric
North Beach Learning text analysis with SolrInAction by Manning on my
iPad/str
long name=_version_1440257540658036736/long
int name=price250/int
/doc
doc
str name=id1/str
str name=typeCaffe/str
str name=contentYummm  Drinking a latte at Caffe Grecco in SF shistoric
North Beach Learning text analysis with SolrInAction by Manning on my
iPad/str
long name=_version_1440257592044552192/long
int name=price100/int
/doc
*
*
*Query:*
http://localhost:8080/solr/collection2/select?q=caffedf=contentgroup=truegroup.field=type

your help will be greatly appreciated!

Regards,
Jamshaid


Re: Commit different database rows to solr with same id value?

2013-07-11 Thread Erick Erickson
Just use the address in the url. You don't have to use the core name
if the defaults are set, which is usually collection1.

So it's something like http://host:port/solr/core2/update? blah blah blah

Erick

On Wed, Jul 10, 2013 at 4:17 PM, Jason Huang jason.hu...@icare.com wrote:
 Thanks David.

 I am actually trying to commit the database row on the fly, not DIH. :)

 Anyway, if I understand you correctly, basically you are suggesting to
 modify the value of the primary key and pass the new value to id before
 committing to solr. This could probably be one solution.

 What if I want to commit the data from table2 to a new core? Anyone knows
 how I can do that?

 thanks,

 Jason

 On Wed, Jul 10, 2013 at 11:18 AM, David Quarterman da...@corexe.com wrote:

 Hi Jason,

 Assuming you're using DIH, why not build a new, unique id within the query
 to use as  the 'doc_id' for SOLR? We do something like this in one of our
 collections. In MySQL, try this (don't know what it would be for any other
 db but there must be equivalents):

 select @rownum:=@rownum+1 rowid, t.* from (main select query) t, (select
 @rownum:=0) s

 Regards,

 DQ

 -Original Message-
 From: Jason Huang [mailto:jason.hu...@icare.com]
 Sent: 10 July 2013 15:50
 To: solr-user@lucene.apache.org
 Subject: Commit different database rows to solr with same id value?

 Hello,

 I am trying to use Solr to store fields from two different database
 tables, where the primary keys are in the format of 1, 2, 3, 

 In Java, we build different POJO classes for these two database tables:

 table1.java

 @SolrIndex(name=id)

 private String idTable1

 


 table2.java

 @SolrIndex(name=id)

 private String idTable2



 And later we add these fields defined in the two different types of tables
 and commit it to solrServer.


 Here is the scenario where I am having issues:

 (1) commit a row from table1 with primary key = 3, this generates a
 document in Solr

 (2) commit another row from table2 with the same value of primary key =
 3, this overwrites the document generated in step (1).


 What we really want to achieve is to keep both rows in (1) and (2) because
 they are from different tables. I've read something from google search and
 it appears that we might be able to do it via keeping multiple cores in
 solr? Could anyone point at how to implement multiple core to achieve this?
 To be more specific, when I commit the row as a document, I don't have a
 place to pick a certain core and I am not sure if it makes any sense for me
 to specify a core when I commit the document since the layer I am working
 on should abstract it away from me.



 The second question is - if we don't want to do a multicore (since we
 can't easily search for related data between multiple cores), how can we
 resolve this issue so both rows from different database table which shares
 the same primary key still exist? We don't want to have to always change
 the primary key format to ensure a uniqueness of the primary key among all
 different types of database tables.


 thanks!


 Jason



Re: Applying Sum on Field

2013-07-11 Thread Peter Sturge
Hi,

If you mean adding up numeric values stored in fields - no, Solr doesn't do
this by default.
We had a similar requirement for this, and created a custom SearchComponent
to handle sum, average, stats etc.
There are a number of things you need to bear in mind, such as:
  * Handling errors when a query asks for sums on fields that are
non-numeric
  * Performance issues - e.g. are you willing to wait to add up 50 million
fields of stringified numbers
  * How to return result payloads in a client-friendly way
  * Be prepared to coalesce results from multi-shard/distributed queries.
It's not trivial, but it is do-able.

Peter




On Thu, Jul 11, 2013 at 12:56 PM, Jamshaid Ashraf jamshaid...@gmail.comwrote:

 Hi,

 I'm a new solr user, I wanted to know is there any way to apply sum on a
 field in a result document of group query?

 Following is the query and its result set, I wanted to apply sum on 'price'
 filed grouping on type:


 *Sample input:*

 doc
 str name=id3/str
 str name=typeCaffe/str
 str name=contentYummm  Drinking a latte at Caffe Grecco in SF shistoric
 North Beach Learning text analysis with SolrInAction by Manning on my
 iPad/str
 long name=_version_1440257540658036736/long
 int name=price250/int
 /doc
 doc
 str name=id1/str
 str name=typeCaffe/str
 str name=contentYummm  Drinking a latte at Caffe Grecco in SF shistoric
 North Beach Learning text analysis with SolrInAction by Manning on my
 iPad/str
 long name=_version_1440257592044552192/long
 int name=price100/int
 /doc
 *
 *
 *Query:*

 http://localhost:8080/solr/collection2/select?q=caffedf=contentgroup=truegroup.field=type

 your help will be greatly appreciated!

 Regards,
 Jamshaid



Re: Commit different database rows to solr with same id value?

2013-07-11 Thread Jason Huang
cool.

so far I've been using the default collection 1 only.

thanks,

Jason

On Thu, Jul 11, 2013 at 7:57 AM, Erick Erickson erickerick...@gmail.comwrote:

 Just use the address in the url. You don't have to use the core name
 if the defaults are set, which is usually collection1.

 So it's something like http://host:port/solr/core2/update? blah blah blah

 Erick

 On Wed, Jul 10, 2013 at 4:17 PM, Jason Huang jason.hu...@icare.com
 wrote:
  Thanks David.
 
  I am actually trying to commit the database row on the fly, not DIH. :)
 
  Anyway, if I understand you correctly, basically you are suggesting to
  modify the value of the primary key and pass the new value to id before
  committing to solr. This could probably be one solution.
 
  What if I want to commit the data from table2 to a new core? Anyone knows
  how I can do that?
 
  thanks,
 
  Jason
 
  On Wed, Jul 10, 2013 at 11:18 AM, David Quarterman da...@corexe.com
 wrote:
 
  Hi Jason,
 
  Assuming you're using DIH, why not build a new, unique id within the
 query
  to use as  the 'doc_id' for SOLR? We do something like this in one of
 our
  collections. In MySQL, try this (don't know what it would be for any
 other
  db but there must be equivalents):
 
  select @rownum:=@rownum+1 rowid, t.* from (main select query) t, (select
  @rownum:=0) s
 
  Regards,
 
  DQ
 
  -Original Message-
  From: Jason Huang [mailto:jason.hu...@icare.com]
  Sent: 10 July 2013 15:50
  To: solr-user@lucene.apache.org
  Subject: Commit different database rows to solr with same id value?
 
  Hello,
 
  I am trying to use Solr to store fields from two different database
  tables, where the primary keys are in the format of 1, 2, 3, 
 
  In Java, we build different POJO classes for these two database tables:
 
  table1.java
 
  @SolrIndex(name=id)
 
  private String idTable1
 
  
 
 
  table2.java
 
  @SolrIndex(name=id)
 
  private String idTable2
 
 
 
  And later we add these fields defined in the two different types of
 tables
  and commit it to solrServer.
 
 
  Here is the scenario where I am having issues:
 
  (1) commit a row from table1 with primary key = 3, this generates a
  document in Solr
 
  (2) commit another row from table2 with the same value of primary key =
  3, this overwrites the document generated in step (1).
 
 
  What we really want to achieve is to keep both rows in (1) and (2)
 because
  they are from different tables. I've read something from google search
 and
  it appears that we might be able to do it via keeping multiple cores in
  solr? Could anyone point at how to implement multiple core to achieve
 this?
  To be more specific, when I commit the row as a document, I don't have a
  place to pick a certain core and I am not sure if it makes any sense
 for me
  to specify a core when I commit the document since the layer I am
 working
  on should abstract it away from me.
 
 
 
  The second question is - if we don't want to do a multicore (since we
  can't easily search for related data between multiple cores), how can we
  resolve this issue so both rows from different database table which
 shares
  the same primary key still exist? We don't want to have to always change
  the primary key format to ensure a uniqueness of the primary key among
 all
  different types of database tables.
 
 
  thanks!
 
 
  Jason
 



Solr caching clarifications

2013-07-11 Thread Manuel Le Normand
Hello,
As a result of frequent java OOM exceptions, I try to investigate more into
the solr jvm memory heap usage.
Please correct me if I am mistaking, this is my understanding of usages for
the heap (per replica on a solr instance):
1. Buffers for indexing - bounded by ramBufferSize
2. Solr caches
3. Segment merge
4. Miscellaneous- buffers for Tlogs, servlet overhead etc.

Particularly I'm concerned by Solr caches and segment merges.
1. How much memory consuming (bytes per doc) are FilterCaches (bitDocSet)
and queryResultCaches (DocList)? I understand it is related to the skip
spaces between doc id's that match (so it's not saved as a bitmap). But
basically, is every id saved as a java int?
2. QueryResultMaxDocsCached - (for example = 100) means that any query
resulting in more than 100 docs will not be cached (at all) in the
queryResultCache? Or does it have to do with the documentCache?
3. DocumentCache - written on the wiki it should be greater than
max_results*concurrent_queries. Max result is just the num of rows
displayed (rows-start) param, right? Not the queryResultWindow.
4. LazyFieldLoading=true - when quering for id's only (fl=id) will this
cache be used? (on the expense of eviction of docs that were already loaded
with stored fields)
5. How large is the heap used by mergings? Assuming we have a merge of 10
segments of 500MB each (half inverted files - *.pos *.doc etc, half non
inverted files - *.fdt, *.tvd), how much heap should be left unused for
this merge?

Thanks in advance,
Manu


solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper

2013-07-11 Thread Zhang, Lisheng
Hi,
 
We are testing solr 4.3.0 in Tomcat (considering upgrading solr 3.6.1 to 
4.3.0), in WIKI page
for solrCloud in Tomcat:
 
http://wiki.apache.org/solr/SolrCloudTomcat
 
we need to link each collection explicitly:
 
///
8) Link uploaded config with target collection
java -classpath .:/home/myuser/solr-war-lib/* org.apache.solr.cloud.ZkCLI -cmd 
linkconfig -collection mycollection -confname ...
///
 
But our application has many cores (a few thousands which all share same 
schema/config,
is there a moe convenient way ?
 
Thanks very much for helps, Lisheng


What happens in indexing request in solr cloud if Zookeepers are all dead?

2013-07-11 Thread Zhang, Lisheng
Hi,
 
In solr cloud latest doc, it mentioned that if all Zookeepers are dead, 
distributed
query still works because solr remembers the cluster state.
 
How about the indexing request handling if all Zookeepers are dead, does solr
needs Zookeeper to know which box is master and which is slave for indexing to
work? Could solr remember master/slave relations without Zookeeper?
 
Also doc said Zookeeper quorum needs to have a majority rule so that we must
have 3 Zookeepers to handle the case one instance is crashed, what would 
happen if we have two instances in quorum and one instance is crashed (or quorum
having 3 instances but two of them are crashed)? I felt the last one should take
over?
 
Thanks very much for helps, Lisheng
 
 


Solr 4.3.0 memory usage is higher than solr 3.6.1?

2013-07-11 Thread Zhang, Lisheng
Hi,
 
We are testing solr 4.3.0 in Tomcat (considering upgrading solr 3.6.1 to 
4.3.0), we have
many cores (a few thousands).
 
We have noticed solr 4.3.0 memory usage is much higher than solr 3.6.1 (without 
using
solr cloud yet). With 2K cores, solr 3.6.1 is using 1.5G, but solr 4.3.0 is 
using close to
3G memory, when Tomcat is initially started.
 
We used shareSchema and sharedLib, we also disabled searcher warm-up during 
startup.
 
We are still debugging the issue, we would appreciate if you could provide any 
guidance?
 
Thanks very much for helps, Lisheng
 
 


nested queries + joins performance

2013-07-11 Thread Marcelo Elias Del Valle
Hello,

Continuing to have fun with joins, I finally figured a way to make my
joins works. Suppose I have inserted data as bellow, using solrj.
If I want to select a parent (room) that has both:

   - a keyboard and a mouse
   - a monitor and a tablet

 In my data, bellow, only room2 should be a match. I was able to get
this working using the following solr query:

q=
*:* AND  _query_:{!join from=root_id to=id}acessory1:Keyboard AND
acessory2:Mouse AND _query_:{!join from=root_id to=id}acessory1:Monitor
AND acessory2:Tablet

As we can see, I am using nested queries. I have a result for each join
and the results are merged as I was expecting. The problem is I can have
about 20 nested joins on a query, sometimes.
Question: How is it performed in Solr, under the hood? If I have 100
million documents, will all my joins (20 joins, for instance) be applied on
100 million documents? Or they will be applied on the result of the prior
join?
For instance, suppose my first query returns 10 documents (selected
among 100 million). Will the other queries apply only in this result or
they will apply on the entire index and then the result is merged?



Data inserted with solrJ:

public void insertDocuments() throws DataGrinderException,
SolrServerException, IOException, DGIndexException {
SolrServer solr = DGSolrServer.get();
// Add parent
SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, room1);
doc.addField(cor_parede, white);
doc.addField(num_cadeiras, 34);
solr.add(doc);

// Add children
SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField(id, computer1);
doc2.addField(acessory1, Keyboard);
doc2.addField(acessory2, Mouse);
doc2.addField(root_id, room1);
solr.add(doc2);

doc2 = new SolrInputDocument();
doc2.addField(id, computer2);
doc2.addField(acessory1, Monitor);
doc2.addField(acessory2, Mouse);
doc2.addField(root_id, room1);
solr.add(doc2);

doc2 = new SolrInputDocument();
doc2.addField(id, computer3);
doc2.addField(acessory1, Keyboard);
doc2.addField(acessory2, Camera);
doc2.addField(root_id, room1);
solr.add(doc2);

doc2 = new SolrInputDocument();
doc2.addField(id, computer4);
doc2.addField(acessory1, Tablet);
doc2.addField(acessory2, Mouse USB);
doc2.addField(root_id, room1);
solr.add(doc2);

// Add parent
doc = new SolrInputDocument();
doc.addField(id, room2);
doc.addField(cor_parede, black);
doc.addField(num_cadeiras, 35);
solr.add(doc);

// Add children
doc2 = new SolrInputDocument();
doc2.addField(id, computer5);
doc2.addField(acessory1, Keyboard);
doc2.addField(acessory2, Mouse);
doc2.addField(root_id, room2);
solr.add(doc2);

doc2 = new SolrInputDocument();
doc2.addField(id, computer6);
doc2.addField(acessory1, Monitor);
doc2.addField(acessory2, Tablet);
doc2.addField(root_id, room2);
solr.add(doc2);

UpdateResponse response = solr.add(doc);
if (response.getStatus() != 0)
throw new DGIndexException(Could not insert document to solr!);
solr.commit();
}



Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


Re: amount of values in a multi value field - is denormalization always the best option?

2013-07-11 Thread Marcelo Elias Del Valle
Hello Flavio,

Out of curiosity, are you already using this in prod? Would you share
your results / benchmarks with us? (not sure if you have some). I wonder
how it is performing for you.
I was thinking in using a very similar schema, comparing to yours. The
thing is: each option has drawbacks, there is no good or bad schema, if
I understood things correctly. Even joins, which is something we should
avoid using in a nosql technology like solr, may be a good option in some
cases, I guess sometimes the only thing that can answer some questions are
POCs and benchmarks. I am not a solr expert, there are several commiters on
this list that might help you much better than I, but the way I think you
should try your solution, see how it performs, and keep looking for
alternatives that perform better forever, if possible.
 As I said, I am not an expert, but I wouldn't call your model a bad
model that needs fix. It's a possible model and who knows, maybe other
model could perform better. It's like in the case of an algorithm, we
should assume we can always do better...

Best regards,
Marcelo.


2013/7/11 Flavio Pompermaier pomperma...@okkam.it

 I also have a similar scenario, where fundamentally I have to retrieve all
 urls where a userid has been found.
 So, in my schema, I designed the url as (string) key and a (possible huge)
 list of attributes automatically mapped to strings.
 For example:

 Url1 (key):
  - language: en
  - content:userid1
  - content:userid1
  - content:userid1 (i.e. 3 times actually for user 1)
  - content:userid2
  - content:userid3
  - author:userid4

 and so on and so forth.
 So, if I did understand, you're saying that this is a bad design? How
 should I fix my schema in your opinion in that case?

 Best,
 Flavio


 On Wed, Jul 10, 2013 at 11:53 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  Simple answer: avoid large number of values in a single document. There
  should only be a modest to moderate number of fields in a single
 document.
 
  Is the data relatively static, or subject to frequent updates? To update
  any field of a single document, even with atomic update, requires Solr to
  read and rewrite every field of the document. So, lots of smaller
 documents
  are best for a frequent update scenario.
 
  Multivalues fields are great for storing a relatively small list of
  values. You can add to the list easily, but under the hood, Solr must
 read
  and rewrite the full list as well as the full document. And, there is no
  way to address or synchronize individual elements of multivalued fields.
 
  Joins are great... if used in moderation. Heavy use of joins is not a
  great idea.
 
  -- Jack Krupansky
 
  -Original Message- From: Marcelo Elias Del Valle
  Sent: Wednesday, July 10, 2013 5:37 PM
  To: solr-user@lucene.apache.org
  Subject: amount of values in a multi value field - is denormalization
  always the best option?
 
 
  Hello,
 
 I have asked a question recently about solr limitations and some about
  joins. It comes that this question is about both at the same time.
 I am trying to figure how to denormalize my data so I will need just 1
  document in my index instead of performing a join. I figure one way of
  doing this is storing an entity as a multivalued field, instead of
 storing
  different fields.
 Let me give an example. Consider the entities:
 
  User:
 id: 1
 type: Joan of Arc
 age: 27
 
  Webpage:
 id: 1
 url: http://wiki.apache.org/solr/**Join
 http://wiki.apache.org/solr/Join
 category: Technical
 user_id: 1
 
 id: 2
 url: http://stackoverflow.com
 category: Technical
 user_id: 1
 
 Instead of creating 1 document for user, 1 for webpage 1 and 1 for
  webpage 2 (1 parent and 2 childs) I could store webpages in a user
  multivalued field, as follows:
 
  User:
 id: 1
 name: Joan of Arc
 age: 27
 webpage1: [id:1, url: http://wiki.apache.org/solr/**Join
 http://wiki.apache.org/solr/Join,
  category:
  Technical]
 webpage2: [id:2, url: http://stackoverflow.com;, category:
  Technical]
 
 It would probably perform better than the join, right? However, it
 made
  me think about solr limitations again. What if I have 200 million webpges
  (200 million fields) per user? Or imagine a case where I could have 200
  million values on a field, like in the case I need to index every html
 DOM
  element (div, a, etc.) for each web page user visited.
 I mean, if I need to do the query and this is a business requirement
 no
  matter what, although denormalizing could be better than using query time
  joins, I wonder it distributing the data present in this single document
  along the cluster wouldn't give me better performance. And this is
  something I won't get with block joins or multivalued fields...
 I guess there is probably no right answer for this question (at least
  not a known one), and I know I should create a POC to check how each
  perform... 

edismax behaviour with japanese

2013-07-11 Thread Shalom Ben-Zvi Kazaz
Hello,
I have a text and text_ja fields where text is english and text_ja is
japanese analyzers, i index both with copyfield from other fields.
I'm trying to search both fields using edismax and qf parameter, but I
see strange behaviour of edismax , I wonder if someone can give me a
hist to what's going on and what am I doing wrong?

when I run this query i can see that solr is searching both fields but
the text_ja: query is only a partial text and text: is the complete text.
http://localhost/solr/core0/select/?indent=onrows=100; debug=query
defType=edismaxqf=text+text_jaq=このたびは
lst name=debug
str name=rawquerystringこのたびは/str
str name=querystringこのたびは/str
str name=parsedquery(+DisjunctionMaxQuery((text_ja:たび | text:この
たびは)))/no_coord/str
str name=parsedquery_toString+(text_ja:たび | text:このたびは)/str
str name=QParserExtendedDismaxQParser/str
/lst


now, if I remove the last two characters from the query string solr will
not search the text_ja, at list that's what I understand from the debug
output:
http://localhost/solr/core0/select/?indent=onrows=100; debug=query
defType=edismaxqf=text+text_jaq=このた
lst name=debug
str name=rawquerystringこのた/str
str name=querystringこのた/str
str name=parsedquery(+DisjunctionMaxQuery((text:このた)))/no_coord
/str
str name=parsedquery_toString+(text:このた)/str
str name=QParserExtendedDismaxQParser/str
/lst

with another string of japanese text solr now cuts the query to multiple
text_ja queries:
http://localhost/solr/core0/select/?indent=onrows=100; debug=query
defType=edismaxqf=text+text_jaq=システムをお買い求め いただき
lst name=debug
str name=rawquerystringシステムをお買い求めいただき/str
str name=querystringシステムをお買い求めいただき/str
str name=parsedquery(+DisjunctionMaxQuerytext_ja:システム
text_ja:買い求める text_ja:いただく)~3) | text:システムをお買い求めいた
だき)))/no_coord/str
str name=parsedquery_toString+(((text_ja:システム text_ja:買い求める
text_ja:いただく)~3) | text:システムをお買い求めいただき)/str
str name=QParserExtendedDismaxQParser/str
/lst



Thank you.


Re: amount of values in a multi value field - is denormalization always the best option?

2013-07-11 Thread Flavio Pompermaier
Yeah, probably you're right..I have to test different configurations!
That is what I'd like to know in advance the available solutions..I'm still
developing fortunately so I'm still in the position to investigate the
solution.
Obviously I'll do some benchmarking on it, but I should know the
alternatives...so I asked the list!
I'm sure someone will give me some hint, at least I hope :)

Best,
Flavio



On Thu, Jul 11, 2013 at 3:46 PM, Marcelo Elias Del Valle mvall...@gmail.com
 wrote:

 Hello Flavio,

 Out of curiosity, are you already using this in prod? Would you share
 your results / benchmarks with us? (not sure if you have some). I wonder
 how it is performing for you.
 I was thinking in using a very similar schema, comparing to yours. The
 thing is: each option has drawbacks, there is no good or bad schema, if
 I understood things correctly. Even joins, which is something we should
 avoid using in a nosql technology like solr, may be a good option in some
 cases, I guess sometimes the only thing that can answer some questions are
 POCs and benchmarks. I am not a solr expert, there are several commiters on
 this list that might help you much better than I, but the way I think you
 should try your solution, see how it performs, and keep looking for
 alternatives that perform better forever, if possible.
  As I said, I am not an expert, but I wouldn't call your model a bad
 model that needs fix. It's a possible model and who knows, maybe other
 model could perform better. It's like in the case of an algorithm, we
 should assume we can always do better...

 Best regards,
 Marcelo.


 2013/7/11 Flavio Pompermaier pomperma...@okkam.it

  I also have a similar scenario, where fundamentally I have to retrieve
 all
  urls where a userid has been found.
  So, in my schema, I designed the url as (string) key and a (possible
 huge)
  list of attributes automatically mapped to strings.
  For example:
 
  Url1 (key):
   - language: en
   - content:userid1
   - content:userid1
   - content:userid1 (i.e. 3 times actually for user 1)
   - content:userid2
   - content:userid3
   - author:userid4
 
  and so on and so forth.
  So, if I did understand, you're saying that this is a bad design? How
  should I fix my schema in your opinion in that case?
 
  Best,
  Flavio
 
 
  On Wed, Jul 10, 2013 at 11:53 PM, Jack Krupansky 
 j...@basetechnology.com
  wrote:
 
   Simple answer: avoid large number of values in a single document.
 There
   should only be a modest to moderate number of fields in a single
  document.
  
   Is the data relatively static, or subject to frequent updates? To
 update
   any field of a single document, even with atomic update, requires Solr
 to
   read and rewrite every field of the document. So, lots of smaller
  documents
   are best for a frequent update scenario.
  
   Multivalues fields are great for storing a relatively small list of
   values. You can add to the list easily, but under the hood, Solr must
  read
   and rewrite the full list as well as the full document. And, there is
 no
   way to address or synchronize individual elements of multivalued
 fields.
  
   Joins are great... if used in moderation. Heavy use of joins is not a
   great idea.
  
   -- Jack Krupansky
  
   -Original Message- From: Marcelo Elias Del Valle
   Sent: Wednesday, July 10, 2013 5:37 PM
   To: solr-user@lucene.apache.org
   Subject: amount of values in a multi value field - is denormalization
   always the best option?
  
  
   Hello,
  
  I have asked a question recently about solr limitations and some
 about
   joins. It comes that this question is about both at the same time.
  I am trying to figure how to denormalize my data so I will need
 just 1
   document in my index instead of performing a join. I figure one way of
   doing this is storing an entity as a multivalued field, instead of
  storing
   different fields.
  Let me give an example. Consider the entities:
  
   User:
  id: 1
  type: Joan of Arc
  age: 27
  
   Webpage:
  id: 1
  url: http://wiki.apache.org/solr/**Join
  http://wiki.apache.org/solr/Join
  category: Technical
  user_id: 1
  
  id: 2
  url: http://stackoverflow.com
  category: Technical
  user_id: 1
  
  Instead of creating 1 document for user, 1 for webpage 1 and 1 for
   webpage 2 (1 parent and 2 childs) I could store webpages in a user
   multivalued field, as follows:
  
   User:
  id: 1
  name: Joan of Arc
  age: 27
  webpage1: [id:1, url: http://wiki.apache.org/solr/**Join
  http://wiki.apache.org/solr/Join,
   category:
   Technical]
  webpage2: [id:2, url: http://stackoverflow.com;, category:
   Technical]
  
  It would probably perform better than the join, right? However, it
  made
   me think about solr limitations again. What if I have 200 million
 webpges
   (200 million fields) per user? Or imagine a case where I could have 200
   million values on a field, like 

How to boost relevance based on distance and age..

2013-07-11 Thread Vineel


Here is the structure of the solr document

   doc
 str name=latlong52.401790,4.936660/str
 date name=dateOfBirth1993-12-09T00:00:00Z/date
   /doc

would like to search for document's based on the following weighted
criteria..

- distance 0-10miles weight 40
- distance 10miles and above weight 20
- Age 0-20years weight 20
- Age 20years and above weight 10

wondering what are the recommended approaches to build SOLR queries for
this?

Thanks
-Vineel




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-relevance-based-on-distance-and-age-tp4077330.html
Sent from the Solr - User mailing list archive at Nabble.com.


Too many documents, composite IndexReaders cannot exceed 2147483647

2013-07-11 Thread Manuel Ignacio Lopez
Hello everybody,

somehow we managed to overload our Solr server 4.2.0 with too many documents 
(many of which are already deleted, but the index is not optimized). Now Solr 
cannot be started anymore, see full strack trace below.

Caused by: java.lang.IllegalArgumentException: Too many documents, composite 
IndexReaders cannot exceed 2147483647
        at 
org.apache.lucene.index.BaseCompositeReader.init(BaseCompositeReader.java:79)
        at 
org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:339)
        at 
org.apache.lucene.index.StandardDirectoryReader.init(StandardDirectoryReader.java:42)
        at 
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:71)
        at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
        at 
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87)
        at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
        at 
org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:124)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1391)

We would like to bring Solr up at least in a maintenance mode to perform the 
optimize, after which the deleted documents should be removed and we have 
only 1.5 billion docs. How can we accomplish this?

Thanks and regards
Manuel



SolrJ and initializing logger in solr 4.3?

2013-07-11 Thread Jonathan Rochkind

I am using SolrJ in a Java (actually jruby) project, with Solr 4.3.

When I instantiate an HttpSolrServer, I get the dreaded:

log4j:WARN No appenders could be found for logger 
(org.apache.solr.client.solrj.impl.HttpClientUtil).

log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.



Using SolrJ as an embedded library in my own software, what is the 
proper or 'best practice' way -- or failing that, just any way at all -- 
to initialize log4j under Solr 4.3?


I am not super familiar with Java or log4j; hopefully there is an easy 
way to do this?


(If someone has a way especially suited for jruby, even better; but just 
a standard Java answer would be great too.)


Thanks for any advice!


Re: SolrJ and initializing logger in solr 4.3?

2013-07-11 Thread Michael Della Bitta
Hi Jonathan,

I think you just need some config on the classpath:

http://logging.apache.org/log4j/1.2/manual.html#defaultInit

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Thu, Jul 11, 2013 at 12:45 PM, Jonathan Rochkind rochk...@jhu.eduwrote:

 I am using SolrJ in a Java (actually jruby) project, with Solr 4.3.

 When I instantiate an HttpSolrServer, I get the dreaded:

 log4j:WARN No appenders could be found for logger
 (org.apache.solr.client.solrj.**impl.HttpClientUtil).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See 
 http://logging.apache.org/**log4j/1.2/faq.html#noconfighttp://logging.apache.org/log4j/1.2/faq.html#noconfigfor
  more info.


 Using SolrJ as an embedded library in my own software, what is the proper
 or 'best practice' way -- or failing that, just any way at all -- to
 initialize log4j under Solr 4.3?

 I am not super familiar with Java or log4j; hopefully there is an easy way
 to do this?

 (If someone has a way especially suited for jruby, even better; but just a
 standard Java answer would be great too.)

 Thanks for any advice!



Re: amount of values in a multi value field - is denormalization always the best option?

2013-07-11 Thread Jack Krupansky

Again, generally, if the number of values is relatively modest and you don't
need to discriminate (tell which one matches on a search) and you don't edit
the list, a multivalued field makes perfect sense, but if any of those
requirements is not true, then you need to represent the items as discrete
Solr documents.

But, it does all depend on your particular data and particular requirements.

-- Jack Krupansky

-Original Message- 
From: Flavio Pompermaier

Sent: Thursday, July 11, 2013 7:50 AM
To: solr-user@lucene.apache.org
Subject: Re: amount of values in a multi value field - is denormalization
always the best option?

I also have a similar scenario, where fundamentally I have to retrieve all
urls where a userid has been found.
So, in my schema, I designed the url as (string) key and a (possible huge)
list of attributes automatically mapped to strings.
For example:

Url1 (key):
- language: en
- content:userid1
- content:userid1
- content:userid1 (i.e. 3 times actually for user 1)
- content:userid2
- content:userid3
- author:userid4

and so on and so forth.
So, if I did understand, you're saying that this is a bad design? How
should I fix my schema in your opinion in that case?

Best,
Flavio


On Wed, Jul 10, 2013 at 11:53 PM, Jack Krupansky
j...@basetechnology.comwrote:


Simple answer: avoid large number of values in a single document. There
should only be a modest to moderate number of fields in a single document.

Is the data relatively static, or subject to frequent updates? To update
any field of a single document, even with atomic update, requires Solr to
read and rewrite every field of the document. So, lots of smaller
documents
are best for a frequent update scenario.

Multivalues fields are great for storing a relatively small list of
values. You can add to the list easily, but under the hood, Solr must read
and rewrite the full list as well as the full document. And, there is no
way to address or synchronize individual elements of multivalued fields.

Joins are great... if used in moderation. Heavy use of joins is not a
great idea.

-- Jack Krupansky

-Original Message- From: Marcelo Elias Del Valle
Sent: Wednesday, July 10, 2013 5:37 PM
To: solr-user@lucene.apache.org
Subject: amount of values in a multi value field - is denormalization
always the best option?


Hello,

   I have asked a question recently about solr limitations and some about
joins. It comes that this question is about both at the same time.
   I am trying to figure how to denormalize my data so I will need just 1
document in my index instead of performing a join. I figure one way of
doing this is storing an entity as a multivalued field, instead of storing
different fields.
   Let me give an example. Consider the entities:

User:
   id: 1
   type: Joan of Arc
   age: 27

Webpage:
   id: 1
   url:
http://wiki.apache.org/solr/**Joinhttp://wiki.apache.org/solr/Join
   category: Technical
   user_id: 1

   id: 2
   url: http://stackoverflow.com
   category: Technical
   user_id: 1

   Instead of creating 1 document for user, 1 for webpage 1 and 1 for
webpage 2 (1 parent and 2 childs) I could store webpages in a user
multivalued field, as follows:

User:
   id: 1
   name: Joan of Arc
   age: 27
   webpage1: [id:1, url:
http://wiki.apache.org/solr/**Joinhttp://wiki.apache.org/solr/Join,
category:
Technical]
   webpage2: [id:2, url: http://stackoverflow.com;, category:
Technical]

   It would probably perform better than the join, right? However, it made
me think about solr limitations again. What if I have 200 million webpges
(200 million fields) per user? Or imagine a case where I could have 200
million values on a field, like in the case I need to index every html DOM
element (div, a, etc.) for each web page user visited.
   I mean, if I need to do the query and this is a business requirement no
matter what, although denormalizing could be better than using query time
joins, I wonder it distributing the data present in this single document
along the cluster wouldn't give me better performance. And this is
something I won't get with block joins or multivalued fields...
   I guess there is probably no right answer for this question (at least
not a known one), and I know I should create a POC to check how each
perform... But do you think a so large number of values in a single
document could make denormalization not possible in an extreme case like
this? Would you share my thoughts if I said denormalization is not always
the right option?

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr



Re: What happens in indexing request in solr cloud if Zookeepers are all dead?

2013-07-11 Thread Jack Krupansky
There are no masters or slaves in SolrCloud - it is fully distributed and 
master-free. Leaders are temporary and can vary over time.


The basic idea for quorum is to prevent split brain - two (or more) 
distinct sets of nodes (zookeeper nodes, that is) each thinking they 
constitute the authoritative source for access to configuration information. 
The trick is to require (N/2)+1 nodes for quorum. For n=3, quorum would be 
(3/2)+1 = 1+1 = 2, so one node can be down. For n=1, quorum = (1/2)+1 = 0 + 
1 = 1. For n=2, quorum would be (2/2)+1 = 1 + 1 = 2, so no nodes can be 
down. IOW, for n=2 no nodes can be down for the cluster to do updates.


-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng

Sent: Thursday, July 11, 2013 9:28 AM
To: solr-user@lucene.apache.org
Subject: What happens in indexing request in solr cloud if Zookeepers are 
all dead?


Hi,

In solr cloud latest doc, it mentioned that if all Zookeepers are dead, 
distributed

query still works because solr remembers the cluster state.

How about the indexing request handling if all Zookeepers are dead, does 
solr
needs Zookeeper to know which box is master and which is slave for indexing 
to

work? Could solr remember master/slave relations without Zookeeper?

Also doc said Zookeeper quorum needs to have a majority rule so that we must
have 3 Zookeepers to handle the case one instance is crashed, what would
happen if we have two instances in quorum and one instance is crashed (or 
quorum
having 3 instances but two of them are crashed)? I felt the last one should 
take

over?

Thanks very much for helps, Lisheng




Re: Applying Sum on Field

2013-07-11 Thread Jack Krupansky
Take a look at the stats component that calculates aggregate values. It 
has a facet parameter that may or may not give you something similar to 
what you want. Or, just form a query that matches the results of the group, 
and then get the stats.


See:
http://wiki.apache.org/solr/StatsComponent

-- Jack Krupansky

-Original Message- 
From: Jamshaid Ashraf

Sent: Thursday, July 11, 2013 7:56 AM
To: solr-user@lucene.apache.org
Subject: Applying Sum on Field

Hi,

I'm a new solr user, I wanted to know is there any way to apply sum on a
field in a result document of group query?

Following is the query and its result set, I wanted to apply sum on 'price'
filed grouping on type:


*Sample input:*

doc
str name=id3/str
str name=typeCaffe/str
str name=contentYummm  Drinking a latte at Caffe Grecco in SF shistoric
North Beach Learning text analysis with SolrInAction by Manning on my
iPad/str
long name=_version_1440257540658036736/long
int name=price250/int
/doc
doc
str name=id1/str
str name=typeCaffe/str
str name=contentYummm  Drinking a latte at Caffe Grecco in SF shistoric
North Beach Learning text analysis with SolrInAction by Manning on my
iPad/str
long name=_version_1440257592044552192/long
int name=price100/int
/doc
*
*
*Query:*
http://localhost:8080/solr/collection2/select?q=caffedf=contentgroup=truegroup.field=type

your help will be greatly appreciated!

Regards,
Jamshaid 



Thousands of cluster state change events per second from zookeeper

2013-07-11 Thread Sundararaju, Shankar
Hi,

We have 3 search client nodes connected to a 12x2 Solr 4.2.1 cluster through 
CloudSolrServer. We are noticing thousands of such events being logged every 
second on these client nodes and filling up the logs quickly. Are there any 
known bug in Zookeeper or SolrJ client that can cause this? When we restarted 
one of the search client node, the notifications stopped on all 3 clients. But 
I am sure it will reappear though because this behavior is intermittent. This 
behavior does not seem to be correlated to indexing since this notifications 
happens whether or not indexing is happening.

Jul 11 2013 10:38:18.537 PDT [-0700] [http-8080-1-EventThread] INFO  
o.a.solr.common.cloud.ZkStateReader - A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 24)
Jul 11 2013 10:38:18.538 PDT [-0700] [http-8080-1-EventThread] INFO  
o.a.solr.common.cloud.ZkStateReader - A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 24)
Jul 11 2013 10:38:18.540 PDT [-0700] [http-8080-1-EventThread] INFO  
o.a.solr.common.cloud.ZkStateReader - A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json,

Thanks
-Shankar



RE: What happens in indexing request in solr cloud if Zookeepers are all dead?

2013-07-11 Thread Zhang, Lisheng
Yes, I should not have used word master/slave for solr cloud!

So if all Zookeepers are dead, could indexing requests be
handled properly (could solr remember the setting for indexing)?

Thanks very much for helps, Lisheng

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, July 11, 2013 10:46 AM
To: solr-user@lucene.apache.org
Subject: Re: What happens in indexing request in solr cloud if
Zookeepers are all dead?


There are no masters or slaves in SolrCloud - it is fully distributed and 
master-free. Leaders are temporary and can vary over time.

The basic idea for quorum is to prevent split brain - two (or more) 
distinct sets of nodes (zookeeper nodes, that is) each thinking they 
constitute the authoritative source for access to configuration information. 
The trick is to require (N/2)+1 nodes for quorum. For n=3, quorum would be 
(3/2)+1 = 1+1 = 2, so one node can be down. For n=1, quorum = (1/2)+1 = 0 + 
1 = 1. For n=2, quorum would be (2/2)+1 = 1 + 1 = 2, so no nodes can be 
down. IOW, for n=2 no nodes can be down for the cluster to do updates.

-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng
Sent: Thursday, July 11, 2013 9:28 AM
To: solr-user@lucene.apache.org
Subject: What happens in indexing request in solr cloud if Zookeepers are 
all dead?

Hi,

In solr cloud latest doc, it mentioned that if all Zookeepers are dead, 
distributed
query still works because solr remembers the cluster state.

How about the indexing request handling if all Zookeepers are dead, does 
solr
needs Zookeeper to know which box is master and which is slave for indexing 
to
work? Could solr remember master/slave relations without Zookeeper?

Also doc said Zookeeper quorum needs to have a majority rule so that we must
have 3 Zookeepers to handle the case one instance is crashed, what would
happen if we have two instances in quorum and one instance is crashed (or 
quorum
having 3 instances but two of them are crashed)? I felt the last one should 
take
over?

Thanks very much for helps, Lisheng




Re: Moving replica from node to node?

2013-07-11 Thread Mark Miller
Yeah, though CREATE and UNLOAD end up being kind of funny descriptors.
You'd think LOAD and UNLOAD or CREATE and DELETE or something...


On Wed, Jul 10, 2013 at 11:35 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Thanks Mark.  I assume you are referring to using the Core Admin API -
 CREATE and UNLOAD?

 Added https://issues.apache.org/jira/browse/SOLR-5032

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Mon, Jul 8, 2013 at 10:50 PM, Mark Miller markrmil...@gmail.com
 wrote:
  It's simply a sugar method that no one has gotten to yet. I almost have
 once or twice, but I always have moved onto other things before even
 starting.
 
  It's fairly simple to just start another replica on the TO node and then
 delete the replica on the FROM node, so not a lot of urgency.
 
  - Mark
 
  On Jul 8, 2013, at 10:18 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:
 
  Hi,
 
  Solr(Cloud) currently doesn't have any facility to move a specific
  replica from one node to the other.
 
  How come?  Is there a technical or philosophical reason, or just the
  24 hours/day reason?
 
  Thanks,
  Otis
  --
  Solr  ElasticSearch Support -- http://sematext.com/
  Performance Monitoring -- http://sematext.com/spm
 




-- 
- Mark


What does too many merges...stalling in indexwriter log mean?

2013-07-11 Thread Tom Burton-West
Hello,

We are seeing the message too many merges...stalling  in our indexwriter
log.   Is this something to be concerned about?  Does it mean we need to
tune something in our indexing configuration?

Tom


Leader Election, when?

2013-07-11 Thread aabreur
I have a working Zookeeper ensemble running with 3 instances and also a
solrcloud cluster with some solr instances. I've created a collection with
settings to 2 shards. Then i:

create 1 core on instance1
create 1 core on instance2
create 1 core on instance1
create 1 core on instance2

Just to have this configuration:

instance1: shard1_leader, shard2_replica
instance2: shard1_replica, shard2_leader

If i add 2 cores to instance1 then 2 cores to instance2, both leaders will
be on instance1 and no re-election is done.

instance1: shard1_leader, shard2_leader
instance2: shard1_replica, shard2_replica

Back to my ideal scenario (detached leaders), also when i add a third
instance with 2 replicas and kill one of my instances running a leader, the
election picks the instance that already have a leader.

My question is why Zookeeper takes this behavior. Shouldn't it distribute
leaders? If i deliver some stress to a double-leader instance, is Zookeeper
going to run an election?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Moving replica from node to node?

2013-07-11 Thread Alan Woodward
And CREATE and UNLOAD are almost exactly the wrong descriptors, because CREATE 
loads up a core that's already there, and UNLOAD can in fact delete it from the 
filesystem…

Alan Woodward
www.flax.co.uk


On 11 Jul 2013, at 20:15, Mark Miller wrote:

 Yeah, though CREATE and UNLOAD end up being kind of funny descriptors.
 You'd think LOAD and UNLOAD or CREATE and DELETE or something...
 
 
 On Wed, Jul 10, 2013 at 11:35 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:
 
 Thanks Mark.  I assume you are referring to using the Core Admin API -
 CREATE and UNLOAD?
 
 Added https://issues.apache.org/jira/browse/SOLR-5032
 
 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm
 
 
 
 On Mon, Jul 8, 2013 at 10:50 PM, Mark Miller markrmil...@gmail.com
 wrote:
 It's simply a sugar method that no one has gotten to yet. I almost have
 once or twice, but I always have moved onto other things before even
 starting.
 
 It's fairly simple to just start another replica on the TO node and then
 delete the replica on the FROM node, so not a lot of urgency.
 
 - Mark
 
 On Jul 8, 2013, at 10:18 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:
 
 Hi,
 
 Solr(Cloud) currently doesn't have any facility to move a specific
 replica from one node to the other.
 
 How come?  Is there a technical or philosophical reason, or just the
 24 hours/day reason?
 
 Thanks,
 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm
 
 
 
 
 
 -- 
 - Mark



SolrJ 4.3 to Solr 1.4

2013-07-11 Thread Jonathan Rochkind
So, trying to use a SolrJ 4.3 to talk to an old Solr 1.4. Specifically 
to add documents.


The wiki at http://wiki.apache.org/solr/Solrj suggests, I think, that 
this should work, so long as you:


server.setParser(new XMLResponseParser());

However, when I do this, I still get a 
org.apache.solr.common.SolrException: parsing error from 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:143)


(If I _don't_ setParser to XML, and use the binary parser... I get a 
fully expected error about binary format corruption -- that part is 
expected and I understand it, that's why you have to use the 
XMLResponseParser instead).


Am I not doing enough to my SolrJ 4.3 to get it to talk to the Solr 1.4 
server in pure XML? I've set the parser to the XMLResponseParser, do I 
also have to somehow tell it to actually use the Solr 1.4 XML update 
handler or something?  I don't entirely understand what I'm talking about.


Alternately... is it just a lost cause trying to get SolrJ 4.3 to talk 
to Solr 1.4, is the wiki wrong that this is possible?


Thanks for any help,

Jonathan


Re: What happens in indexing request in solr cloud if Zookeepers are all dead?

2013-07-11 Thread Jack Krupansky
Sorry, no updates if no Zookeepers. There would be no way to assure that any 
node knows the proper configuration. Queries are a little safer using most 
recent configuration without zookeeper, but update consistency requires 
accurate configuration information.


-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng

Sent: Thursday, July 11, 2013 2:59 PM
To: solr-user@lucene.apache.org
Subject: RE: What happens in indexing request in solr cloud if Zookeepers 
are all dead?


Yes, I should not have used word master/slave for solr cloud!

So if all Zookeepers are dead, could indexing requests be
handled properly (could solr remember the setting for indexing)?

Thanks very much for helps, Lisheng

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, July 11, 2013 10:46 AM
To: solr-user@lucene.apache.org
Subject: Re: What happens in indexing request in solr cloud if
Zookeepers are all dead?


There are no masters or slaves in SolrCloud - it is fully distributed and
master-free. Leaders are temporary and can vary over time.

The basic idea for quorum is to prevent split brain - two (or more)
distinct sets of nodes (zookeeper nodes, that is) each thinking they
constitute the authoritative source for access to configuration information.
The trick is to require (N/2)+1 nodes for quorum. For n=3, quorum would be
(3/2)+1 = 1+1 = 2, so one node can be down. For n=1, quorum = (1/2)+1 = 0 +
1 = 1. For n=2, quorum would be (2/2)+1 = 1 + 1 = 2, so no nodes can be
down. IOW, for n=2 no nodes can be down for the cluster to do updates.

-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng

Sent: Thursday, July 11, 2013 9:28 AM
To: solr-user@lucene.apache.org
Subject: What happens in indexing request in solr cloud if Zookeepers are
all dead?

Hi,

In solr cloud latest doc, it mentioned that if all Zookeepers are dead,
distributed
query still works because solr remembers the cluster state.

How about the indexing request handling if all Zookeepers are dead, does
solr
needs Zookeeper to know which box is master and which is slave for indexing
to
work? Could solr remember master/slave relations without Zookeeper?

Also doc said Zookeeper quorum needs to have a majority rule so that we must
have 3 Zookeepers to handle the case one instance is crashed, what would
happen if we have two instances in quorum and one instance is crashed (or
quorum
having 3 instances but two of them are crashed)? I felt the last one should
take
over?

Thanks very much for helps, Lisheng




Re: SolrJ 4.3 to Solr 1.4

2013-07-11 Thread Chris Hostetter

: However, when I do this, I still get a org.apache.solr.common.SolrException:
: parsing error from
: 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:143)

it's impossible to guess what the underlying problem might be unless you 
can provide us the full error.

The one thing i can think of that might not be obvious is that legacy 
header format (quey param version=1.0 vs version=2.0) might be confusing 
the XMLResponseParser ... i don't remember when the default changed but i 
thought it was *before* Solr 1.4

https://wiki.apache.org/solr/XMLResponseFormat#A.27version.27

-Hoss


Re: Partial Matching in both query and field

2013-07-11 Thread James Bathgate
Jack,

This still isn't working. I just upgraded to 3.6.2 to verify that wasn't
the issue.

Here's query information:

lst name=params
str name=debugQueryon/str
str name=indenton/str
str name=start0/str
str name=q0_extrafield1_n:20454/str
str name=rows10/str
str name=version2.2/str
/lst
/lst
result name=response numFound=0 start=0/
lst name=debug
str name=rawquerystring0_extrafield1_n:20454/str
str name=querystring0_extrafield1_n:20454/str
str name=parsedqueryPhraseQuery(0_extrafield1_n:2o45 o454 2o454)/str
str name=parsedquery_toString0_extrafield1_n:2o45 o454 2o454/str
lst name=explain/
str name=QParserLuceneQParser/str

Here's the applicable lines from schema.xml:

fieldType name=ngram class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=false
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
splitOnNumerics=0 preserveOriginal=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=0
replacement=o replace=all/
filter class=solr.PatternReplaceFilterFactory pattern=1|l
replacement=i replace=all/
filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=16/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.NGramTokenizerFactory minGramSize=4
maxGramSize=16 /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.PatternReplaceFilterFactory
pattern=[^A-Za-z0-9]+ replacement= replace=all/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=0
replacement=o replace=all/
filter class=solr.PatternReplaceFilterFactory pattern=1|l
replacement=i replace=all/
!--filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=4 /--
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 dynamicField name=*_n type=ngram indexed=true stored=true /

It looks like it's generating phrases to me even though I have it set to
false.

James


[image: SearchSpring | Findability Unleashed]

James Bathgate | Sr. Developer

Toll Free (888) 643-9043 x610 - Fax (719) 358-2027

4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918
www.searchspring.net   http://www.searchspring.net


On Tue, Jul 2, 2013 at 2:47 PM, Jack Krupansky j...@basetechnology.comwrote:

 Ahhh... you put autoGeneratePhraseQueries=**false  on the field - but
 it needs to be on the field type.

 You can see from the parsed query that it generated the phrase.


 -- Jack Krupansky

 -Original Message- From: James Bathgate
 Sent: Tuesday, July 02, 2013 5:35 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Partial Matching in both query and field


 Jack,

 I've already tried that, here's my query:

 str name=debugQueryon/str
 str name=indenton/str
 str name=start0/str
 str name=q0_extrafield1_n:**20454/str
 str name=q.opOR/str
 str name=rows10/str
 str name=version2.2/str

 Here's the parsed query:

 str name=parsedquery_toString0_**extrafield1_n:2o45 o454 2o454/str

 Here's the applicable lines from schema.xml:

fieldType name=ngram class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.**WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=**true/
filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
 splitOnNumerics=0 preserveOriginal=0/
filter class=solr.**LowerCaseFilterFactory/
filter class=solr.**PatternReplaceFilterFactory pattern=0
 replacement=o replace=all/
filter class=solr.**PatternReplaceFilterFactory pattern=1|l
 replacement=i replace=all/
filter class=solr.**NGramFilterFactory minGramSize=4
 maxGramSize=16/
filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/
  /analyzer
  analyzer type=query
tokenizer class=solr.**NGramTokenizerFactory minGramSize=4
 maxGramSize=16 /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=**true/
filter class=solr.**PatternReplaceFilterFactory
 pattern=[^A-Za-z0-9]+ replacement= replace=all/
filter 

Re: SolrJ 4.3 to Solr 1.4

2013-07-11 Thread Jonathan Rochkind

Huh, that might have been a false problem of some kind.

At the moment, it looks like I _do_ have my SolrJ 4.3 succesfully 
talking to a Solr 1.4, so long as I setParser(new XMLResponseParser()).


Not sure what I changed or what wasn't working before, but great!

So nevermind. Although if anyone reading this wants to share any other 
potential gotchas on solrj 4.3 talking to solr 1.4, feel free!


On 7/11/13 4:24 PM, Jonathan Rochkind wrote:

So, trying to use a SolrJ 4.3 to talk to an old Solr 1.4. Specifically
to add documents.

The wiki at http://wiki.apache.org/solr/Solrj suggests, I think, that
this should work, so long as you:

server.setParser(new XMLResponseParser());

However, when I do this, I still get a
org.apache.solr.common.SolrException: parsing error from
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:143)


(If I _don't_ setParser to XML, and use the binary parser... I get a
fully expected error about binary format corruption -- that part is
expected and I understand it, that's why you have to use the
XMLResponseParser instead).

Am I not doing enough to my SolrJ 4.3 to get it to talk to the Solr 1.4
server in pure XML? I've set the parser to the XMLResponseParser, do I
also have to somehow tell it to actually use the Solr 1.4 XML update
handler or something?  I don't entirely understand what I'm talking about.

Alternately... is it just a lost cause trying to get SolrJ 4.3 to talk
to Solr 1.4, is the wiki wrong that this is possible?

Thanks for any help,

Jonathan


Re: Partial Matching in both query and field

2013-07-11 Thread James Bathgate
I just noticed I pasted the wrong fieldType with the extra tokenizer not
commented out.

fieldType name=ngram class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=false
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
splitOnNumerics=0 preserveOriginal=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=0
replacement=o replace=all/
filter class=solr.PatternReplaceFilterFactory pattern=1|l
replacement=i replace=all/
filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=16/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.NGramTokenizerFactory minGramSize=4
maxGramSize=16 /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.PatternReplaceFilterFactory
pattern=[^A-Za-z0-9]+ replacement= replace=all/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=0
replacement=o replace=all/
filter class=solr.PatternReplaceFilterFactory pattern=1|l
replacement=i replace=all/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


[image: SearchSpring | Findability Unleashed]

James Bathgate | Sr. Developer

Toll Free (888) 643-9043 x610 - Fax (719) 358-2027

4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918
www.searchspring.net   http://www.searchspring.net


On Thu, Jul 11, 2013 at 2:15 PM, James Bathgate ja...@b7interactive.comwrote:

 Jack,

 This still isn't working. I just upgraded to 3.6.2 to verify that wasn't
 the issue.

 Here's query information:

 lst name=params

 str name=debugQueryon/str
 str name=indenton/str
 str name=start0/str
 str name=q0_extrafield1_n:20454/str
 str name=rows10/str
 str name=version2.2/str
 /lst
 /lst
 result name=response numFound=0 start=0/
 lst name=debug
 str name=rawquerystring0_extrafield1_n:20454/str
 str name=querystring0_extrafield1_n:20454/str
 str name=parsedqueryPhraseQuery(0_extrafield1_n:2o45 o454
 2o454)/str

 str name=parsedquery_toString0_extrafield1_n:2o45 o454 2o454/str
 lst name=explain/
 str name=QParserLuceneQParser/str


 Here's the applicable lines from schema.xml:

 fieldType name=ngram class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=false

   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
 splitOnNumerics=0 preserveOriginal=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PatternReplaceFilterFactory pattern=0
 replacement=o replace=all/
 filter class=solr.PatternReplaceFilterFactory pattern=1|l
 replacement=i replace=all/
 filter class=solr.NGramFilterFactory minGramSize=4
 maxGramSize=16/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.NGramTokenizerFactory minGramSize=4
 maxGramSize=16 /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.PatternReplaceFilterFactory
 pattern=[^A-Za-z0-9]+ replacement= replace=all/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PatternReplaceFilterFactory pattern=0
 replacement=o replace=all/
 filter class=solr.PatternReplaceFilterFactory pattern=1|l
 replacement=i replace=all/
 !--filter class=solr.NGramFilterFactory minGramSize=4
 maxGramSize=4 /--

 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

  dynamicField name=*_n type=ngram indexed=true stored=true /

 It looks like it's generating phrases to me even though I have it set to
 false.

 James


 [image: SearchSpring | Findability Unleashed]

 James Bathgate | Sr. Developer

 Toll Free (888) 643-9043 x610 - Fax (719) 358-2027

 4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918
 www.searchspring.net   http://www.searchspring.net


 On Tue, Jul 2, 2013 at 2:47 PM, 

Re: Partial Matching in both query and field

2013-07-11 Thread Jack Krupansky

A couple of possibilities:

1. Make sure to reload the core.
2. Check that the Solr schema version is new enough to recognize 
autoGeneratePhraseQueries.

3. What query parser are you using?

-- Jack Krupansky

-Original Message- 
From: James Bathgate

Sent: Thursday, July 11, 2013 5:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Partial Matching in both query and field

I just noticed I pasted the wrong fieldType with the extra tokenizer not
commented out.

   fieldType name=ngram class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=false
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
splitOnNumerics=0 preserveOriginal=0/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.PatternReplaceFilterFactory pattern=0
replacement=o replace=all/
   filter class=solr.PatternReplaceFilterFactory pattern=1|l
replacement=i replace=all/
   filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=16/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.NGramTokenizerFactory minGramSize=4
maxGramSize=16 /
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.PatternReplaceFilterFactory
pattern=[^A-Za-z0-9]+ replacement= replace=all/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.PatternReplaceFilterFactory pattern=0
replacement=o replace=all/
   filter class=solr.PatternReplaceFilterFactory pattern=1|l
replacement=i replace=all/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType


[image: SearchSpring | Findability Unleashed]

James Bathgate | Sr. Developer

Toll Free (888) 643-9043 x610 - Fax (719) 358-2027

4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918
www.searchspring.net   http://www.searchspring.net


On Thu, Jul 11, 2013 at 2:15 PM, James Bathgate 
ja...@b7interactive.comwrote:



Jack,

This still isn't working. I just upgraded to 3.6.2 to verify that wasn't
the issue.

Here's query information:

lst name=params

str name=debugQueryon/str
str name=indenton/str
str name=start0/str
str name=q0_extrafield1_n:20454/str
str name=rows10/str
str name=version2.2/str
/lst
/lst
result name=response numFound=0 start=0/
lst name=debug
str name=rawquerystring0_extrafield1_n:20454/str
str name=querystring0_extrafield1_n:20454/str
str name=parsedqueryPhraseQuery(0_extrafield1_n:2o45 o454
2o454)/str

str name=parsedquery_toString0_extrafield1_n:2o45 o454 2o454/str
lst name=explain/
str name=QParserLuceneQParser/str


Here's the applicable lines from schema.xml:

fieldType name=ngram class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=false

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
splitOnNumerics=0 preserveOriginal=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=0
replacement=o replace=all/
filter class=solr.PatternReplaceFilterFactory pattern=1|l
replacement=i replace=all/
filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=16/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.NGramTokenizerFactory minGramSize=4
maxGramSize=16 /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.PatternReplaceFilterFactory
pattern=[^A-Za-z0-9]+ replacement= replace=all/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=0
replacement=o replace=all/
filter class=solr.PatternReplaceFilterFactory pattern=1|l
replacement=i replace=all/
!--filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=4 /--

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 dynamicField name=*_n type=ngram indexed=true stored=true /

It looks like it's generating phrases to me even though I have it set 

Re: What does too many merges...stalling in indexwriter log mean?

2013-07-11 Thread Shawn Heisey

On 7/11/2013 1:47 PM, Tom Burton-West wrote:

We are seeing the message too many merges...stalling  in our indexwriter
log.   Is this something to be concerned about?  Does it mean we need to
tune something in our indexing configuration?


It sounds like you've run into the maximum number of simultaneous 
merges, which I believe defaults to two, or maybe three.  The following 
config section in indexConfig will likely take care of the issue. 
This assumes 3.6 or later, I believe that on older versions, this goes 
in indexDefaults.


  mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxThreadCount1/int
int name=maxMergeCount6/int
  /mergeScheduler

Looking through the source code to confirm, this definitely seems like 
the case.  Increasing maxMergeCount is likely going to speed up your 
indexing, at least by a little bit.  A value of 6 is probably high 
enough for mere mortals, buy you guys don't do anything small, so I 
won't begin to speculate what you'll need.


If you are using spinning disks, you'll want maxThreadCount at 1.  If 
you're using SSD, then you can likely increase that value.


Thanks,
Shawn



Re: SolrJ 4.3 to Solr 1.4

2013-07-11 Thread Shawn Heisey

On 7/11/2013 2:24 PM, Jonathan Rochkind wrote:

(If I _don't_ setParser to XML, and use the binary parser... I get a
fully expected error about binary format corruption -- that part is
expected and I understand it, that's why you have to use the
XMLResponseParser instead).

Am I not doing enough to my SolrJ 4.3 to get it to talk to the Solr 1.4
server in pure XML? I've set the parser to the XMLResponseParser, do I
also have to somehow tell it to actually use the Solr 1.4 XML update
handler or something?  I don't entirely understand what I'm talking about.


From everything I understand, it should be possible to make this work. 
 For XML updates, the handler should be /update on both 1.4 and 4.x. 
There might be some additional steps that need to be taken, but without 
more info I'm not sure what those steps might be.


Is there more to the client-side exception?  Do you see anything in the 
server-side logs?  If your server is logging at INFO, you should 
hopefully be able to see some of the actual request.  Can you share a 
larger snippet of your SolrJ code?


Thanks,
Shawn



Re: Partial Matching in both query and field

2013-07-11 Thread James Bathgate
1. My general process for a schema change (I know it's overkill) is delete
the data directory, reload, index data, reload again.

2. I'm using schema version 1.5 on Solr 3.6.2.

schema name=SearchSpringDefault version=1.5

3. LuceneQParser, but I've also tried dismax and edismax.

Here's my solrQueryParser field in my schema, I think OR is correct for
this.
solrQueryParser defaultOperator=OR/

James


[image: SearchSpring | Findability Unleashed]

James Bathgate | Sr. Developer

Toll Free (888) 643-9043 x610 - Fax (719) 358-2027

4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918
www.searchspring.net   http://www.searchspring.net


On Thu, Jul 11, 2013 at 2:29 PM, Jack Krupansky j...@basetechnology.comwrote:

 A couple of possibilities:

 1. Make sure to reload the core.
 2. Check that the Solr schema version is new enough to recognize
 autoGeneratePhraseQueries.
 3. What query parser are you using?


 -- Jack Krupansky

 -Original Message- From: James Bathgate
 Sent: Thursday, July 11, 2013 5:26 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Partial Matching in both query and field

 I just noticed I pasted the wrong fieldType with the extra tokenizer not
 commented out.

fieldType name=ngram class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=**false
  analyzer type=index
tokenizer class=solr.**WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=**true/
filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
 splitOnNumerics=0 preserveOriginal=0/
filter class=solr.**LowerCaseFilterFactory/
filter class=solr.**PatternReplaceFilterFactory pattern=0
 replacement=o replace=all/
filter class=solr.**PatternReplaceFilterFactory pattern=1|l
 replacement=i replace=all/
filter class=solr.**NGramFilterFactory minGramSize=4
 maxGramSize=16/
filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/
  /analyzer
  analyzer type=query
tokenizer class=solr.**NGramTokenizerFactory minGramSize=4
 maxGramSize=16 /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=**true/
filter class=solr.**PatternReplaceFilterFactory
 pattern=[^A-Za-z0-9]+ replacement= replace=all/
filter class=solr.**LowerCaseFilterFactory/
filter class=solr.**PatternReplaceFilterFactory pattern=0
 replacement=o replace=all/
filter class=solr.**PatternReplaceFilterFactory pattern=1|l
 replacement=i replace=all/
filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/
  /analyzer
/fieldType


 [image: SearchSpring | Findability Unleashed]

 James Bathgate | Sr. Developer

 Toll Free (888) 643-9043 x610 - Fax (719) 358-2027

 4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918
 www.searchspring.net   http://www.searchspring.net



 On Thu, Jul 11, 2013 at 2:15 PM, James Bathgate ja...@b7interactive.com*
 *wrote:

  Jack,

 This still isn't working. I just upgraded to 3.6.2 to verify that wasn't
 the issue.

 Here's query information:

 lst name=params

 str name=debugQueryon/str
 str name=indenton/str
 str name=start0/str
 str name=q0_extrafield1_n:**20454/str
 str name=rows10/str
 str name=version2.2/str
 /lst
 /lst
 result name=response numFound=0 start=0/
 lst name=debug
 str name=rawquerystring0_**extrafield1_n:20454/str
 str name=querystring0_**extrafield1_n:20454/str
 str name=parsedquery**PhraseQuery(0_extrafield1_n:**2o45 o454
 2o454)/str

 str name=parsedquery_toString0_**extrafield1_n:2o45 o454
 2o454/str
 lst name=explain/
 str name=QParserLuceneQParser/**str


 Here's the applicable lines from schema.xml:

 fieldType name=ngram class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=**false

   analyzer type=index
 tokenizer class=solr.**WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=**true/
 filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
 splitOnNumerics=0 preserveOriginal=0/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**PatternReplaceFilterFactory pattern=0
 replacement=o replace=all/
 filter class=solr.**PatternReplaceFilterFactory pattern=1|l
 replacement=i replace=all/
 filter class=solr.**NGramFilterFactory minGramSize=4
 maxGramSize=16/
 filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/
   /analyzer
   analyzer 

How to set a condition over stats result

2013-07-11 Thread Matt Lieber

Hello,

I am trying to see how I can test the sum of values of an attribute across
docs.
I.e. Whether sum(myfieldvalue)100 .

I know I can use the stats module which compiles the sum of my attributes
on a certain facet , but how can I perform a test this result (i.e. Is
sum100) within my stats query? From what I read, it's not supported yet
to perform a function on the stats module..
Any other way to do this ?

Cheers,
Matt












NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


POST question

2013-07-11 Thread John Randall
I want to use a browser and use HTTP POST to add a single document (not a file) 
to Solr. I don't want to use cURL. I've made several attempts, such as the 
following:
 
http://localhost:8080/solr/update?commit=truestream.type=text/xml;adddocfield
  name=id61234567/fieldfield name=titleWAR OF THE 
WORLDS/fielddoc/add
 
 I get following message which makes it appear the POST was successful, but 
when I query on the id, there are no results. I've commited in a separate post 
too, but again, no results.
 
  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lstname=responseHeader
  intname=status0/int 
  intname=QTime15/int 
  /lst
  /response
 
It's probably a syntax error, but not sure.
 
I'm using Solr 3.6 on Windows XP SP 3. 
 
Any help would be appreciated.

RE: POST question

2013-07-11 Thread Roland Villemoes
Hi John,

You can't make a browser to a HTTP POST by adding a URL in a browser. You are 
doing a HTTP GET.  

So - use curl, or make a small application for doing the HTTP POST.  Or even 
better: Use a browser plugin.   Several of these exists. 
Example: DEV HTTP CLIENT extension for Chrome. 

Roland Villemoes

-Original Message-
From: John Randall [mailto:jmr...@yahoo.com] 
Sent: 12. juli 2013 00:12
To: solr-user@lucene.apache.org
Subject: POST question

I want to use a browser and use HTTP POST to add a single document (not a file) 
to Solr. I don't want to use cURL. I've made several attempts, such as the 
following:
 
http://localhost:8080/solr/update?commit=truestream.type=text/xml;adddocfield
  name=id61234567/fieldfield name=titleWAR OF THE 
WORLDS/fielddoc/add
 
 I get following message which makes it appear the POST was successful, but 
when I query on the id, there are no results. I've commited in a separate post 
too, but again, no results.
 
  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lstname=responseHeader
  intname=status0/int 
  intname=QTime15/int 
  /lst
  /response
 
It's probably a syntax error, but not sure.
 
I'm using Solr 3.6 on Windows XP SP 3. 
 
Any help would be appreciated.


Re: POST question

2013-07-11 Thread Shawn Heisey

On 7/11/2013 4:12 PM, John Randall wrote:

I want to use a browser and use HTTP POST to add a single document (not a file) 
to Solr. I don't want to use cURL. I've made several attempts, such as the 
following:

http://localhost:8080/solr/update?commit=truestream.type=text/xml;adddocfield  
name=id61234567/fieldfield name=titleWAR OF THE WORLDS/fielddoc/add

  I get following message which makes it appear the POST was successful, but 
when I query on the id, there are no results. I've commited in a separate post 
too, but again, no results.

   ?xml version=1.0 encoding=UTF-8 ?
- response
- lstname=responseHeader
   intname=status0/int
   intname=QTime15/int
   /lst
   /response


This is actually not a POST.  It's a GET -- that's the only kind of 
request you can make from a browser with a URL that's typed or pasted. 
In order to get a POST request from a browser, you need to have an HTML 
page with an HTML form in it and submit that form.  I'm not going to go 
into how to do this here, because that is basic HTML stuff.


If you use the stream.body parameter for your XML update, you might be 
able to use a GET request and have it actually work.


http://wiki.apache.org/solr/UpdateXmlMessages#Updating_via_GET

URL encoding the XML characters is required, as mentioned on that page.

I recently tried to do this myself on Solr 4.4-SNAPSHOT, and it didn't 
work.  I never did figure out why.  It's probably more likely to work on 
a 3.x version.


Thanks,
Shawn



Re: POST question

2013-07-11 Thread John Randall
I'll try the plugin. Thanks.




From: Roland Villemoes r...@alpha-solutions.dk
To: solr-user@lucene.apache.org solr-user@lucene.apache.org; John Randall 
jmr...@yahoo.com 
Sent: Thursday, July 11, 2013 6:21 PM
Subject: RE: POST question


Hi John,

You can't make a browser to a HTTP POST by adding a URL in a browser. You are 
doing a HTTP GET.  

So - use curl, or make a small application for doing the HTTP POST.  Or even 
better: Use a browser plugin.  Several of these exists. 
Example: DEV HTTP CLIENT extension for Chrome. 

Roland Villemoes

-Original Message-
From: John Randall [mailto:jmr...@yahoo.com] 
Sent: 12. juli 2013 00:12
To: solr-user@lucene.apache.org
Subject: POST question

I want to use a browser and use HTTP POST to add a single document (not a file) 
to Solr. I don't want to use cURL. I've made several attempts, such as the 
following:
 
http://localhost:8080/solr/update?commit=truestream.type=text/xml;adddocfield
  name=id61234567/fieldfield name=titleWAR OF THE 
WORLDS/fielddoc/add
 
 I get following message which makes it appear the POST was successful, but 
when I query on the id, there are no results. I've commited in a separate post 
too, but again, no results.
 
  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lstname=responseHeader
  intname=status0/int 
  intname=QTime15/int 
  /lst
  /response
 
It's probably a syntax error, but not sure.
 
I'm using Solr 3.6 on Windows XP SP 3. 
 
Any help would be appreciated.

Re: POST question

2013-07-11 Thread John Randall
I'll probably move to Solr 4.x, so I'm going to try a plugin instead. Thanks 
for you insights.




From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org 
Sent: Thursday, July 11, 2013 6:28 PM
Subject: Re: POST question


On 7/11/2013 4:12 PM, John Randall wrote:
 I want to use a browser and use HTTP POST to add a single document (not a 
 file) to Solr. I don't want to use cURL. I've made several attempts, such as 
 the following:

 http://localhost:8080/solr/update?commit=truestream.type=text/xml;adddocfield
   name=id61234567/fieldfield name=titleWAR OF THE 
 WORLDS/fielddoc/add

  I get following message which makes it appear the POST was successful, but 
when I query on the id, there are no results. I've commited in a separate post 
too, but again, no results.

    ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lstname=responseHeader
    intname=status0/int
    intname=QTime15/int
    /lst
    /response

This is actually not a POST.  It's a GET -- that's the only kind of 
request you can make from a browser with a URL that's typed or pasted. 
In order to get a POST request from a browser, you need to have an HTML 
page with an HTML form in it and submit that form.  I'm not going to go 
into how to do this here, because that is basic HTML stuff.

If you use the stream.body parameter for your XML update, you might be 
able to use a GET request and have it actually work.

http://wiki.apache.org/solr/UpdateXmlMessages#Updating_via_GET

URL encoding the XML characters is required, as mentioned on that page.

I recently tried to do this myself on Solr 4.4-SNAPSHOT, and it didn't 
work.  I never did figure out why.  It's probably more likely to work on 
a 3.x version.

Thanks,
Shawn

preferred container for running SolrCloud

2013-07-11 Thread Ali, Saqib
1) Jboss
2) Jetty
3) Tomcat
4) Other..

?


Re: preferred container for running SolrCloud

2013-07-11 Thread Saikat Kanjilal
We're running under jetty.

Sent from my iPhone

On Jul 11, 2013, at 6:06 PM, Ali, Saqib docbook@gmail.com wrote:

 1) Jboss
 2) Jetty
 3) Tomcat
 4) Other..
 
 ?


Re: preferred container for running SolrCloud

2013-07-11 Thread Ali, Saqib
With the embedded Zookeeper or separate Zookeeper? Also have run into any
issues with running SolrCloud on jetty?


On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal sxk1...@hotmail.comwrote:

 We're running under jetty.

 Sent from my iPhone

 On Jul 11, 2013, at 6:06 PM, Ali, Saqib docbook@gmail.com wrote:

  1) Jboss
  2) Jetty
  3) Tomcat
  4) Other..
 
  ?



Re: preferred container for running SolrCloud

2013-07-11 Thread Anshum Gupta
On production, I'd highly recommend you to run Zk separately as that'd give
you, among other things, the liberty of shutting down a SolrCloud instance.
I haven't heard or seen any SolrCloud issues while running it on jetty.


On Fri, Jul 12, 2013 at 7:57 AM, Ali, Saqib docbook@gmail.com wrote:

 With the embedded Zookeeper or separate Zookeeper? Also have run into any
 issues with running SolrCloud on jetty?


 On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal sxk1...@hotmail.com
 wrote:

  We're running under jetty.
 
  Sent from my iPhone
 
  On Jul 11, 2013, at 6:06 PM, Ali, Saqib docbook@gmail.com wrote:
 
   1) Jboss
   2) Jetty
   3) Tomcat
   4) Other..
  
   ?
 




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: preferred container for running SolrCloud

2013-07-11 Thread Walter Underwood
Embedded Zookeeper is only for dev. Production needs to run a ZK cluster.  
--wunder

On Jul 11, 2013, at 7:27 PM, Ali, Saqib wrote:

 With the embedded Zookeeper or separate Zookeeper? Also have run into any
 issues with running SolrCloud on jetty?
 
 
 On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal sxk1...@hotmail.comwrote:
 
 We're running under jetty.
 
 Sent from my iPhone
 
 On Jul 11, 2013, at 6:06 PM, Ali, Saqib docbook@gmail.com wrote:
 
 1) Jboss
 2) Jetty
 3) Tomcat
 4) Other..
 
 ?
 






Re: preferred container for running SolrCloud

2013-07-11 Thread Ali, Saqib
Thanks Walter. And the container..


On Thu, Jul 11, 2013 at 7:55 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Embedded Zookeeper is only for dev. Production needs to run a ZK cluster.
  --wunder

 On Jul 11, 2013, at 7:27 PM, Ali, Saqib wrote:

  With the embedded Zookeeper or separate Zookeeper? Also have run into any
  issues with running SolrCloud on jetty?
 
 
  On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal sxk1...@hotmail.com
 wrote:
 
  We're running under jetty.
 
  Sent from my iPhone
 
  On Jul 11, 2013, at 6:06 PM, Ali, Saqib docbook@gmail.com
 wrote:
 
  1) Jboss
  2) Jetty
  3) Tomcat
  4) Other..
 
  ?
 







Re: preferred container for running SolrCloud

2013-07-11 Thread Walter Underwood
We use Tomcat for everything. It might not be the best, but it is what our Ops 
group is used to.

wunder

On Jul 11, 2013, at 7:58 PM, Ali, Saqib wrote:

 Thanks Walter. And the container..
 
 
 On Thu, Jul 11, 2013 at 7:55 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Embedded Zookeeper is only for dev. Production needs to run a ZK cluster.
 --wunder
 
 On Jul 11, 2013, at 7:27 PM, Ali, Saqib wrote:
 
 With the embedded Zookeeper or separate Zookeeper? Also have run into any
 issues with running SolrCloud on jetty?
 
 
 On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal sxk1...@hotmail.com
 wrote:
 
 We're running under jetty.
 
 Sent from my iPhone
 
 On Jul 11, 2013, at 6:06 PM, Ali, Saqib docbook@gmail.com
 wrote:
 
 1) Jboss
 2) Jetty
 3) Tomcat
 4) Other..
 
 ?
 
 
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





RE: preferred container for running SolrCloud

2013-07-11 Thread Saikat Kanjilal
Separate Zookeeper.

 Date: Thu, 11 Jul 2013 19:27:18 -0700
 Subject: Re: preferred container for running SolrCloud
 From: docbook@gmail.com
 To: solr-user@lucene.apache.org
 
 With the embedded Zookeeper or separate Zookeeper? Also have run into any
 issues with running SolrCloud on jetty?
 
 
 On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal sxk1...@hotmail.comwrote:
 
  We're running under jetty.
 
  Sent from my iPhone
 
  On Jul 11, 2013, at 6:06 PM, Ali, Saqib docbook@gmail.com wrote:
 
   1) Jboss
   2) Jetty
   3) Tomcat
   4) Other..
  
   ?
 
  

RE: preferred container for running SolrCloud

2013-07-11 Thread Saikat Kanjilal
One last thing, no issues with jetty.  The issues we did have was actually 
running separate zookeeper clusters.

 From: sxk1...@hotmail.com
 To: solr-user@lucene.apache.org
 Subject: RE: preferred container for running SolrCloud
 Date: Thu, 11 Jul 2013 20:13:27 -0700
 
 Separate Zookeeper.
 
  Date: Thu, 11 Jul 2013 19:27:18 -0700
  Subject: Re: preferred container for running SolrCloud
  From: docbook@gmail.com
  To: solr-user@lucene.apache.org
  
  With the embedded Zookeeper or separate Zookeeper? Also have run into any
  issues with running SolrCloud on jetty?
  
  
  On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal sxk1...@hotmail.comwrote:
  
   We're running under jetty.
  
   Sent from my iPhone
  
   On Jul 11, 2013, at 6:06 PM, Ali, Saqib docbook@gmail.com wrote:
  
1) Jboss
2) Jetty
3) Tomcat
4) Other..
   
?
  
 
  

Re: How to set a condition over stats result

2013-07-11 Thread Jack Krupansky
None that I know of, short of writing a custom search component. Seriously, 
you could hack up a copy of the stats component with your own logic.


Actually... this may be a case for the new, proposed Script Request Handler, 
which would let you execute a query and then you could do any custom 
JavaScript logic you wanted.


When we get that feature, it might be interesting to implement a variation 
of the standard stats component as a JavaScript script, and then people 
could easily hack it such as in your request. Fascinating.


-- Jack Krupansky

-Original Message- 
From: Matt Lieber

Sent: Thursday, July 11, 2013 6:08 PM
To: solr-user@lucene.apache.org
Subject: How to set a condition over stats result




Hello,

I am trying to see how I can test the sum of values of an attribute across
docs.
I.e. Whether sum(myfieldvalue)100 .

I know I can use the stats module which compiles the sum of my attributes
on a certain facet , but how can I perform a test this result (i.e. Is
sum100) within my stats query? From what I read, it's not supported yet
to perform a function on the stats module..
Any other way to do this ?

Cheers,
Matt












NOTE: This message may contain information that is confidential, 
proprietary, privileged or otherwise protected by law. The message is 
intended solely for the named addressee. If received in error, please 
destroy and notify the sender. Any use of this email is prohibited when 
received in error. Impetus does not represent, warrant and/or guarantee, 
that the integrity of this communication has been maintained nor that the 
communication is free of errors, virus, interception or interference. 



Re: How to set a condition over stats result

2013-07-11 Thread mihaela olteanu
What if you perform sub(sum(myfieldvalue),100)  0 using frange?



 From: Jack Krupansky j...@basetechnology.com
To: solr-user@lucene.apache.org 
Sent: Friday, July 12, 2013 7:44 AM
Subject: Re: How to set a condition over stats result
 

None that I know of, short of writing a custom search component. Seriously, you 
could hack up a copy of the stats component with your own logic.

Actually... this may be a case for the new, proposed Script Request Handler, 
which would let you execute a query and then you could do any custom JavaScript 
logic you wanted.

When we get that feature, it might be interesting to implement a variation of 
the standard stats component as a JavaScript script, and then people could 
easily hack it such as in your request. Fascinating.

-- Jack Krupansky

-Original Message- From: Matt Lieber
Sent: Thursday, July 11, 2013 6:08 PM
To: solr-user@lucene.apache.org
Subject: How to set a condition over stats result

 
Hello,

I am trying to see how I can test the sum of values of an attribute across
docs.
I.e. Whether sum(myfieldvalue)100 .

I know I can use the stats module which compiles the sum of my attributes
on a certain facet , but how can I perform a test this result (i.e. Is
sum100) within my stats query? From what I read, it's not supported yet
to perform a function on the stats module..
Any other way to do this ?

Cheers,
Matt












NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference. 

How to set a condition on the number of docs found

2013-07-11 Thread Matt Lieber
Hello there,

I would like to be able to know whether I got over a certain threshold of
doc results.

I.e. Test (Result.numFound  10 ) - true.

Is there a way to do this ? I can't seem to find how to do this; (other
than have to do this test on the client app, which is not great).

Thanks,
Matt









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Usage of CloudSolrServer?

2013-07-11 Thread sathish_ix
Hi ,

Iam using cloudsolrserver to connect to solrcloud, im indexing the documents
using solrj API using cloudsolrserver object. Index is triggered on master
node of a collection, whereas if i need to find the status of the loading ,
it return the message from replica where status is null. How to find which
instance the cloudsolrserver is connecting ?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Usage-of-CloudSolrServer-tp4056052p4077471.html
Sent from the Solr - User mailing list archive at Nabble.com.