Re: [sqlite] Help forming query
also: if this query isn't *very* fast, then i'm fine with just "give me the value of the first cell where there is data in that column" in the below case, i'd get a "1". i would then proceed to delete everything with a "1" in it (from this and other tables) then i'd ask again, this time i'd get a 4, repeat above then ask again, get nothing, and i'd be done On Jun 17, 2014, at 10:54 PM, David M. Cotter wrote: > i have a table with a numeric column (not the key column) > i want to obtain from this table a list of unique numbers appearing in that > one column > > some cells in the column may have nothing, some may have duplicate numbers eg: > >> 1 >> 1 >> 1 >> 4 >> _ >> _ >> 4 >> _ > > note that "_" means "no data". i want to get a list with [1, 4] as the > result. what is the proper SQLite query for this? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Help forming query
i have a table with a numeric column (not the key column) i want to obtain from this table a list of unique numbers appearing in that one column some cells in the column may have nothing, some may have duplicate numbers eg: > 1 > 1 > 1 > 4 > _ > _ > 4 > _ note that "_" means "no data". i want to get a list with [1, 4] as the result. what is the proper SQLite query for this? -- kJams: Mac and Windows Karaoke: CD+G Rip, Mix & Burn! Main: http://kjams.com/wiki/ Downloads: http://kjams.com/downloads/ What's New: http://kjams.com/history/ To Unsubscribe: Simply reply with "kJams: unsubscribe" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Proximity ranking with FTS
Let's say I have search phrase "TermA TermB" matchinfo option 'p' would be 2. CREATE VIRTUAL TABLE t1 USING fts4(title, content); matchinfo option 'c' returns 2 for the number of columns. Now consider sample data: |1|""|"TermA"| |2|"TermA TermB"|"TermA TermA"| |3|"TermA TermA TermA"|"TermB"| matchinfo option 'x' would have ('p' * 'c' * 3) bytes of data per row. But each of these are aggregate pieces of information. For example in the list of (p)hrase terms I'm interested in the one at index [0] :: "TermA" in (c)olumn [0] for when matchinfo is looking at row [3]; I'm going to need a list of 3 token positions. But if there were more matches I'm going to need N token positions. So if matchinfo had a 't' option which is the total number of token hits within the row and this is an int then we can have option 'q' which would have the following data: int N = matchinfo[T_OFFSET]; for (int i = 0; i < N; i++) { // this is composed on my phone so pardon the poor indenting. int phraseTerm = matchinfo[Q_OFFSET + 3*i]; int column = matchinfo[Q_OFFSET + 3*i + 1]; int tokenPosition = matchinfo[Q_OFFSET + 3*i + 2]; } Again ideally this would be precomputed so matchinfo can maintain its speed in forming the BLOB. This is similar to how offsets() returns results but the documentation says that offsets() is an order of magnitude slower and I'm presuming it is using the fts3tokenize() on the matched results and tokenising the data again. A quick win would be to make a token_offsets() function that uses the fts3tokenise() function to get the values we are after by tokenising the results. Technically it'd get the job done but I'd like it to still have the speed matchinfo has so the proximity ranking isn't waiting on tokenising documents all the time. So if it is to be precalculated it will have to be stored in a shadow table somewhere and also updated accordingly with FTS4 INSERT, UPDATE and DELETE actions. Regards Josh -- View this message in context: http://sqlite.1065341.n5.nabble.com/Proximity-ranking-with-FTS-tp76149p76156.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Proximity ranking with FTS
This info is however not provided to an extension function. It would be great if the token number ¹t¹ could be added. Ben Am 17.06.14 19:36 schrieb "Dan Kennedy" unter : >On 06/17/2014 10:48 AM, Josh Wilson wrote: >> Yeah I had thought about using the byte distance between words but you >>get >> these instances: >> >> [Example A] >> |word1|10charword|word2| >> >> [Example B] >> |word1|3charword|4charword|3charword|word2| >> >> By using byte distances, both of these score the same, where Example A >> should score more highly. >> >> But it would seem I can use the fts3_tokenizer somehow to get the token >> positions or that this underlying value is available but just not >>stored in >> an accessible manner. > >I think it's possible to do. When it visits a row as part of a full-text >search, internally FTS has a list of matches within the current row for >each phrase in the query. Each match is stored as a column and token >offset - the number of tokens that precede the match within the column >text. > >Is that what you need? Do you have any ideas for an fts4 interface it? > >Dan. > > > > >> >> I implemented OkapiBM25f [1] but was hoping to implement something like >>the >> following proximity ranking [2] as it combines Bag-Of-Words ranking and >> proximity ranking. Although that article proposes to precalculate the >> distance pairs for all tokens, I'm happy to accept the TimeCost and >> calculate on the fly as that SpaceCost won't be worth it. >> >> [1] https://github.com/neozenith/sqlite-okapi-bm25 >> [2] http://infolab.stanford.edu/~theobald/pub/proximity-spire07.pdf >> >> >> >> -- >> View this message in context: >>http://sqlite.1065341.n5.nabble.com/Proximity-ranking-with-FTS-tp76149p76 >>152.html >> Sent from the SQLite mailing list archive at Nabble.com. >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >___ >sqlite-users mailing list >sqlite-users@sqlite.org >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Proximity ranking with FTS
On 06/17/2014 10:48 AM, Josh Wilson wrote: Yeah I had thought about using the byte distance between words but you get these instances: [Example A] |word1|10charword|word2| [Example B] |word1|3charword|4charword|3charword|word2| By using byte distances, both of these score the same, where Example A should score more highly. But it would seem I can use the fts3_tokenizer somehow to get the token positions or that this underlying value is available but just not stored in an accessible manner. I think it's possible to do. When it visits a row as part of a full-text search, internally FTS has a list of matches within the current row for each phrase in the query. Each match is stored as a column and token offset - the number of tokens that precede the match within the column text. Is that what you need? Do you have any ideas for an fts4 interface it? Dan. I implemented OkapiBM25f [1] but was hoping to implement something like the following proximity ranking [2] as it combines Bag-Of-Words ranking and proximity ranking. Although that article proposes to precalculate the distance pairs for all tokens, I'm happy to accept the TimeCost and calculate on the fly as that SpaceCost won't be worth it. [1] https://github.com/neozenith/sqlite-okapi-bm25 [2] http://infolab.stanford.edu/~theobald/pub/proximity-spire07.pdf -- View this message in context: http://sqlite.1065341.n5.nabble.com/Proximity-ranking-with-FTS-tp76149p76152.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Linux sqlite3 executable return value - what does it return?
Simon Slavin wrote: > > On 16 Jun 2014, at 9:03pm, c...@isbd.net wrote: > > > > Remember - I said I'm doing an INSERT, either it will insert the > > intended data or there will be an error. > > But there are two types of error: the INSERT command may turn out to be > invalid (accidentally including a quote character or an unexpected semicolon) > or the INSERT command may be legitimate but the INSERT may violate a > constraint. > And it may be useful to know what type of error occurred. > This is a script that's going to be run once an hour by a cron job. Apart from the debugging stage there won't be any syntax errors. So, hopefully, any error will be some sort of data or system quirk that stops it working. All I want to know (in the script at least) is whether the data has been written by the INSERT, if it hasn't (i.e. the script sees an error) then the source data isn't deleted and I get an E-Mail from cron telling me something went wrong. -- Chris Green · ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users