performance of json vs xml?

2011-12-11 Thread Jason Toy
I'm thinking about modifying my index process to use json because all my docs are originally in json anyway . Are there any performance issues if I insert json docs instead of xml docs? A colleague recommended to me to stay with xml because solr is highly optimized for xml.

how to implement per doc weighting

2011-12-07 Thread Jason Toy
I've been reading the solr source code and made modifications by implementing a custom Similarity class. I want to implement a weight to the score by multiplying a number based on if the current doc has certain term in it. So if the query was q=data_text:foo then the Similiarity class would

Re: joins and filter queries effecting scoring

2011-12-05 Thread Jason Toy
that contains all user documents that are active. This docset is used as filter during the execution of the main query (q param), so it only returns posts with the contain the text hello for active users. Martijn On 28 October 2011 01:57, Jason Toy jason...@gmail.com wrote: Does anyone have

getting lots of errors doing bulk insertion

2011-11-15 Thread Jason Toy
I've written a script that does bulk insertion from my database, it grabs chunks of 500 docs (out of 100 million ) and inserts them into solr over http. I have 5 threads that are inserting from a queue. After each insert I issue a commit. Every 20 or so inserts I get this error message: Error:

Re: Limit by score? sort by other field

2011-10-27 Thread Jason Toy
I have a similar problem except I need to filter scores that are too high. Robert Stewart bstewart...@gmail.com 於 Oct 27, 2011 7:04 AM 寫道: BTW, this would be good standard feature for SOLR, as I've run into this requirement more than once. On Oct 27, 2011, at 9:49 AM,

Re: joins and filter queries effecting scoring

2011-10-27 Thread Jason Toy
Does anyone have any idea on this issue? On Tue, Oct 25, 2011 at 11:40 AM, Jason Toy jason...@gmail.com wrote: Hi Yonik, Without a Join I would normally query user docs with: q=data_text:testfq=is_active_boolean:true With joining users with posts, I get no no results: q={!join from

Re: joins and filter queries effecting scoring

2011-10-25 Thread Jason Toy
, but with the ability to join with the Posts docs. On Tue, Oct 25, 2011 at 11:30 AM, Yonik Seeley yo...@lucidimagination.comwrote: Can you give an example of the request (URL) you are sending to Solr? -Yonik http://www.lucidimagination.com On Mon, Oct 24, 2011 at 3:31 PM, Jason Toy jason...@gmail.com

joins and filter queries effecting scoring

2011-10-24 Thread Jason Toy
I have 2 types of docs, users and posts. I want to view all the docs that belong to certain users by joining posts and users together. I have to filter the users with a filter query of is_active_boolean:true so that the score is not effected,but since I do a join, I have to move the filter query

composite Unique Keys?

2011-10-04 Thread Jason Toy
I have several different document types that I store. I use a serialized integer that is unique to the document type. If I use id as the uniqueKey, then there is a possibility to have colliding docs on the id, what would be the best way to have a unique id given I am storing my unique identifier

I think I've found a bug with filter queries and joins

2011-09-30 Thread Jason Toy
I'm testing out the join functionality on the svn revision 1175424. I've found when I add a single filter query to a join it works fine, but when I do more then 1 filter query, the query does not return results. This single function query with a join returns results:

Re: dismax with AND/OR combination

2011-09-29 Thread Jason Toy
Can dismax understand that query in a translated form? 在 Sep 29, 2011 10:01 PM 時,yingshou guo guoyings...@gmail.com 寫到: you cann't use this kind of query syntax against dismax query parser. your query can by understood by standard query parser or edismax query parser. qt request parameter is

resource to see which versions build from trunk?

2011-09-24 Thread Jason Toy
Hi all, I am testing various versions of solr from trunk, I am finding that often times the example doesn't build and I can't test out the version. Is there a resource that shows which versions build correctly so that we can test it out?

what are the disdvantages of using dynamic fields?

2011-09-23 Thread Jason Toy
Hi all, I'd like to know what the specific disadvantages are for using dynamic fields in my schema are? About half of my fields are dynamic, but I could move all of them to be static fields. WIll my searches run faster? If there are no disadvantages, can I just set all my fields to be dynamic?

Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Jason Toy
I am running the sun version: java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) I get multiple Out of memory exceptions looking at my application and the solr logs, but my script doesn't get called the first

how to perform joins with function queries?

2011-09-20 Thread Jason Toy
I had a join query that was originally written as : {!join from=self_id_i to=user_id_i}data_text:hello and that works fine. I later added an fq filter: {!frange l=0.05 }div(termfreq(data_text,'hello'),max_i) and the query doesn't work anymore. if I do the fq by itself without the join the query

OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-16 Thread Jason Toy
I have solr issues where I keep running out of memory. I am working on solving the memory issues (this will take a long time), but in the meantime, I'm trying to be notified when the error occurs. I saw with the jvm I can pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time

Re: how would I use the new join feature given my schema.

2011-09-15 Thread Jason Toy
Anyone know the query I would do to get the join to work? I'm unable to get it to work. On Wed, Sep 14, 2011 at 10:49 AM, Jason Toy jason...@gmail.com wrote: I've been reading the information on the new join feature and am not quite sure how I would use it given my schema structure. I have

how would I use the new join feature given my schema.

2011-09-14 Thread Jason Toy
I've been reading the information on the new join feature and am not quite sure how I would use it given my schema structure. I have User docs and BlogPost docs and I want to return all BlogPosts that match the fulltext title cool that belong to Users that match the description solr. Here are the

using a function query with OR and spaces?

2011-09-13 Thread Jason Toy
I had queries breaking on me when there were spaces in the text I was searching for. Originally I had : fq=state_s:New York and that would break, I found a work around by using: fq={!raw f=state_s}New York My problem now is doing this with an OR query, this is what I have now, but it doesn't

Re: using a function query with OR and spaces?

2011-09-13 Thread Jason Toy
I wrote the title wrong, its a filter query, not a function query, thanks for the correction. The field is a string, I had tried fq=stats_s:New York before and that did not work, I'm puzzled to why this didn't work. I tried out your b suggestion and that worked,thanks! On Tue, Sep 13, 2011 at

Re: How to plug a new ANTLR grammar

2011-09-13 Thread Jason Toy
I'd love to see the progress on this. On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, The standard lucene/solr parsing is nice but not really flexible. I saw questions and discussion about ANTLR, but unfortunately never a working grammar, so... maybe you find

syntax for functions used in the fq parameter

2011-08-26 Thread Jason Toy
I'm trying to limit my data to only docs that have the word 'foo' appear at least once. I am trying to use: fq=termfreqdata,'foo'):[1+TO+*] but I get the syntax error: Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered : : at line 1, column 33. Was expecting one of:

automatically dealing with out of memory exceptions

2011-08-24 Thread Jason Toy
After running a combination of different queries, my solr server eventually is unable to complete certain requests because it runs out of memory, which means I need to restart the server as its basically useless with some queries working and not others. I am moving to distributed setting soon,

solr keeps dying every few hours.

2011-08-17 Thread Jason Toy
I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of heap memory issues. I started upping the min memory required, currently I use -Xms3072M . I insert about 50k docs an hour and I currently have about 65 million docs with about 10 fields each. Is this already too much

Re: solr keeps dying every few hours.

2011-08-17 Thread Jason Toy
. -Original Message- From: Jason Toy [mailto:jason...@gmail.com] Sent: Wednesday, August 17, 2011 5:15 PM To: solr-user@lucene.apache.org Subject: solr keeps dying every few hours. I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of heap memory issues. I

Re: solr keeps dying every few hours.

2011-08-17 Thread Jason Toy
I've only set set minimum memory and have not set maximum memory. I'm doing more investigation and I see that I have 100+ dynamic fields for my documents, not the 10 fields I quoted earlier. I also sort against those dynamic fields often, I'm reading that this potentially uses a lot of memory.

Re: solr keeps dying every few hours.

2011-08-17 Thread Jason Toy
What can I do temporarily in this situation? It seems like I must eventually move to a distributed setup. I am sorting on dynamic float fields. On Wed, Aug 17, 2011 at 3:01 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Wed, Aug 17, 2011 at 5:56 PM, Jason Toy jason...@gmail.com wrote

is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms?

bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
/8/8 Jason Toy jason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toy jason...@gmail.com I am trying to list some data based

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
choices effect what is possible at query time. Lucene In Action is a pretty good book. On 8/8/2011 5:02 PM, Jason Toy wrote: Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand

getting result count only

2011-08-06 Thread Jason Toy
How can I run a query to get the result count only? I only need the count and so I dont need solr to send me all the results back.

dealing with so many different sorting options

2011-07-29 Thread Jason Toy
As I'm using solr more and more, I'm finding that I need to do searches and then order by new criteria. So I am constantly add new fields into solr and then reindexing everything. I want to know if adding in all this data into solr is the normal way to deal with sorting. I'm finding that I

problem searching on non standard characters

2011-07-22 Thread Jason Toy
How does one search for words with characters like # and +. I have tried searching solr with #test and \#test but all my results always come up with test and not #test. Is this some kind of configuration option I need to set in solr? -- - sent from my mobile 6176064373

saving timestamps in trunk broken?

2011-07-22 Thread Jason Toy
In Solr 1.3.1 I am able to store timestamps in my docs so that I query them. In trunk when I try to store a doc with a timestamp I get a sever error, is there a different way I should store this data or is this a bug? Jul 22, 2011 7:20:14 PM org.apache.solr.update.processor.LogUpdateProcessor

Re: saving timestamps in trunk broken?

2011-07-22 Thread Jason Toy
I haven't modified my schema in the older solr or trunk solr,is it required to modify my schema to support timestamps? On Fri, Jul 22, 2011 at 4:45 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : In Solr 1.3.1 I am able to store timestamps in my docs so that I query them. : : In trunk

Re: saving timestamps in trunk broken?

2011-07-22 Thread Jason Toy
=solr.DateField sortMissingLast=true omitNorms=true/ On Fri, Jul 22, 2011 at 5:00 PM, Jason Toy jason...@gmail.com wrote: I haven't modified my schema in the older solr or trunk solr,is it required to modify my schema to support timestamps? On Fri, Jul 22, 2011 at 4:45 PM, Chris Hostetter

Re: saving timestamps in trunk broken?

2011-07-22 Thread Jason Toy
Hi Chris, you were correct, the filed was getting set as a double. Thanks for the help. On Fri, Jul 22, 2011 at 7:03 PM, Jason Toy jason...@gmail.com wrote: This is the document I am posting: ?xml version=1.0 encoding=UTF-8?adddocfield name=idPost 75004824785129473/fieldfield name=typePost

Re: I found a sorting bug in solr/lucene

2011-07-19 Thread Jason Toy
According to that bug list, there are other characters that break the sorting function. Is there a list of safe characters I can use as a delimiter? On Mon, Jul 18, 2011 at 1:31 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : When I try to sort by a column with a colon in it like :

I found a sorting bug in solr/lucene

2011-07-18 Thread Jason Toy
Hi all, I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1 When I try to sort by a column with a colon in it like scores:rails_f, solr has cutoff the column name from the colon forward so scores:rails_f becomes scores To test, I inserted this doc: In 1.4.1 I was able to

Re: I found a sorting bug in solr/lucene

2011-07-18 Thread Jason Toy
actually prohibited, but that could be your problem. Nick On 7/18/2011 8:10 AM, Jason Toy wrote: Hi all, I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1 When I try to sort by a column with a colon in it like scores:rails_f, solr has cutoff the column name from

searching for google+

2011-07-18 Thread Jason Toy
How does one search for the term google+ with solr? I noticed on twitter I can search for google+: http://search.twitter.com/search?q=google%2B (which uses lucene, not sure about solr) but searching on my copy of solr, I can't search for google+ -- - sent from my mobile 6176064373

sorting by termfreq on trunk doesn't work?

2011-06-22 Thread Jason Toy
I am trying to use sorting by the termfreq function using the trunk code since termfreq was added in the 4.0 code base. I run this query: http://127.0.0.1:8983/solr/select/?q=librariansort=termfreq(all_lists_text,librarian)%20desc but I get: HTTP ERROR 500 Problem accessing /solr/select/.

example doesnt run from source?

2011-06-19 Thread Jason Toy
I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run : java -jar start.jar and Jetty starts with: INFO::Started SocketConnector@0.0.0.0:8983 But then when I go to my browser and go to this address: http://localhost:8983/solr/ I get a 404 error.

can't determine sort order?

2011-06-11 Thread Jason Toy
I am trying to use sorting by function on solr 3.2 and it doesn't now workt with termfreq. I do this query: /solr/select?q=testqf=all_lists_textdefType=dismaxsort=termfreq%28all_lists_text%2Ctest%29+descrows=50 I get this error: Can't determine Sort Order: 'termfreq(description_text,'test')

Re: how can I return function results in my query?

2011-06-10 Thread Jason Toy
Ahmet, that doesnt return the idf data in my results, unless I am doing something wrong. When you run any function you get the results of the function back? Can you show me an example query you run ? //http://wiki.apache.org/solr/FunctionQuery#idf On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy

Re: how can I return function results in my query?

2011-06-10 Thread Jason Toy
the results of the function back? Can you show me an example query you run ? //http://wiki.apache.org/solr/FunctionQuery#idf On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy jason...@gmail.com wrote: I want to be able to run a query like idf(text, 'term') and have that data returned with my

how can I return function results in my query?

2011-06-09 Thread Jason Toy
I want to be able to run a query like idf(text, 'term') and have that data returned with my search results. I've searched the docs,but I'm unable to find how to do it. Is this possible and how can I do that ?

found a bug in query parser upgrading from 1.4.1 to 3.1

2011-06-03 Thread Jason Toy
. For that reason I believe the bug is in solr and not in lucene. Jason Toy socmetrics http://socmetrics.com @jtoy