[jira] [Created] (IGNITE-4517) Add ability to execute SQL queries on certain partition(s)

2017-01-02 Thread Vladimir Ozerov (JIRA)
Vladimir Ozerov created IGNITE-4517:
---

 Summary: Add ability to execute SQL queries on certain partition(s)
 Key: IGNITE-4517
 URL: https://issues.apache.org/jira/browse/IGNITE-4517
 Project: Ignite
  Issue Type: Task
  Components: SQL
Reporter: Vladimir Ozerov
 Fix For: 2.0


*Motivation*
This could be useful for certain cases:
1) Simple queries where partition can be determined easily in advance, either 
automatically (IGNITE-4509, IGNITE-4510), or manually.
2) Spark data frame integration (IGNITE-3084)

*Proposed API*
class Query {
int[] partitions();
void partitions(int...);
}

Important points:
1) Partitions are defined in the very base {{Query}} class because we already 
has this feature for {{ScanQuery}} and potentially any query type can benefit 
from it. If query doesn't support partitions, exception should be thrown.
2) User should be able to specify multiple partitions, not only one. This will 
make our API more flexible for 3-rd party integrations like Spark. Also it will 
help users with fine-grained tuning. E.g. if user has a query {{... WHERE 
attribute IN (?, ?, ...)}}, he can determine partitions for {{IN}} arguments in 
advance.

Probably this feature should not be supported for distributed joins. On the 
other hand - why not? Query is always created from some cache, so the first map 
step can be executed only against target partitions, and the rest execution 
flow can go through all partitions of other caches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4354) DML: BinaryObjectBuilder does not sort fields in some cases

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-4354:

Component/s: binary

> DML: BinaryObjectBuilder does not sort fields in some cases
> ---
>
> Key: IGNITE-4354
> URL: https://issues.apache.org/jira/browse/IGNITE-4354
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, SQL
>Affects Versions: 1.8
>Reporter: Pavel Tupitsyn
>  Labels: DML
> Fix For: 2.0
>
>
> One of the setField methods uses TreeMap<> to sort fields, depending on 
> BinaryUtils.FIELDS_SORTED_ORDER.
> However, there are other places where assignedVals is initialized (setField, 
> removeField) which are not updated to use TreeMap.
> Make sure to put this logic in one place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4164) Add support for parallel loading of caches using custom SQL load queries in org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore#loadCache

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-4164:

Component/s: (was: SQL)
 cache

> Add support for parallel loading of caches using custom SQL load queries in 
> org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore#loadCache
> ---
>
> Key: IGNITE-4164
> URL: https://issues.apache.org/jira/browse/IGNITE-4164
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 1.7
>Reporter: Anghel Botos
>
> Please add support for parallel loading of caches using custom SQL load 
> queries in 
> org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore#loadCache. For the 
> moment this is not possible, as the current implementation performs the load 
> for each entity type in the cache using a single 
> {{LoadCacheCustomQueryWorker}}, as opposed to the the approach that is used 
> when no custom SQL queries are provided for the load, where for each entity 
> type, the load is distributed across several threads based on some ranges.
> While it may not be possible to support parralel load with any custom SQL 
> query (as this would mean that Ignite would have to somehow understand the 
> meaning of that custom query), it would still be a significant improvement if 
> it would be possible to have the parallel load when providing a custom 
> {{WHERE}} clause for each entity type (instead of a full custom query).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4150) B-Tree index cannot be used efficiently with IN clause.

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-4150:

Description: 
Consider the following query:
{code}
SELECT * FROM table
WHERE a = ? AND b IN (?, ?)
{code}

If there is an index {{(a, b)}}, it will not be used properly: only column 
{{a}} will be used. This will leads to multiple unnecessary comparisons.

Most obvious way to fix that - use temporary table and {{JOIN}}. However, this 
approach doesn't work well when there are multiple {{IN}}'s. 

Proper solution would be to hack deeper into H2.

  was:
Consider the following query:
{code}
SELECT * FROM table
WHERE a = ? AND b IN (?, ?)
{code}

If there is an index {{(a, b)}}, it will not be used properly: only column 
{{a}} will be used. This will leads to multiple unnecessary comparisons.

Most obvious way to fix that - use temporary table and {{JOIN}}. However, this 
approach doesn't work well when there are multiple {{IN}}s. 

Proper solution would be to hack deeper into H2.


> B-Tree index cannot be used efficiently with IN clause.
> ---
>
> Key: IGNITE-4150
> URL: https://issues.apache.org/jira/browse/IGNITE-4150
> Project: Ignite
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 1.7
>Reporter: Vladimir Ozerov
>  Labels: performance
> Fix For: 2.0
>
>
> Consider the following query:
> {code}
> SELECT * FROM table
> WHERE a = ? AND b IN (?, ?)
> {code}
> If there is an index {{(a, b)}}, it will not be used properly: only column 
> {{a}} will be used. This will leads to multiple unnecessary comparisons.
> Most obvious way to fix that - use temporary table and {{JOIN}}. However, 
> this approach doesn't work well when there are multiple {{IN}}'s. 
> Proper solution would be to hack deeper into H2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (IGNITE-4032) SQL performance issues

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov resolved IGNITE-4032.
-
Resolution: Won't Fix

This was umbrella ticket. Converted all related issues to tasks. Closing this 
one.

> SQL performance issues
> --
>
> Key: IGNITE-4032
> URL: https://issues.apache.org/jira/browse/IGNITE-4032
> Project: Ignite
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Andrew Mashenkov
>  Labels: performance
>
> Umbrella ticket



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (IGNITE-4032) SQL performance issues

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov closed IGNITE-4032.
---

> SQL performance issues
> --
>
> Key: IGNITE-4032
> URL: https://issues.apache.org/jira/browse/IGNITE-4032
> Project: Ignite
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Andrew Mashenkov
>  Labels: performance
>
> Umbrella ticket



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4035) SQL: Avoid excessive calls of deterministic functions on same arguments

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-4035:

Issue Type: Task  (was: Sub-task)
Parent: (was: IGNITE-4032)

> SQL: Avoid excessive calls of deterministic functions on same arguments
> ---
>
> Key: IGNITE-4035
> URL: https://issues.apache.org/jira/browse/IGNITE-4035
> Project: Ignite
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 1.6, 1.7
>Reporter: Andrew Mashenkov
>  Labels: performance
>
> In sql query example below, heavy "datediff" deterministic function will be 
> called 4 times per row. I'd expected function was called once per row. 
> Example:
> {noformat}
> Select
>   avg(datediff('s',ts1,ts2)) as avg_diff,
>   min(datediff('s',ts1,ts2)) as min_diff,
>   max(datediff('s',ts1,ts2)) as max_diff
> From table
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-3959) SQL: Optimize Date\Time fields conversion.

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-3959:

Issue Type: Task  (was: Sub-task)
Parent: (was: IGNITE-4032)

> SQL: Optimize Date\Time fields conversion.
> --
>
> Key: IGNITE-3959
> URL: https://issues.apache.org/jira/browse/IGNITE-3959
> Project: Ignite
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 1.6
>Reporter: Andrew Mashenkov
>  Labels: newbie, performance
>
> SqlFieldsQueries slowdown on date\time fields processing due to ineffective 
> java.util.Calendar usage for date manipulation by H2 database.
> Good point to start is IgniteH2Indexing.wrap() method. Make optimization for 
> types DATE and TIME as it already done for TIMESTAMP type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4031) SQL: Optimize Date\Time\Timestamp function arguments conversion.

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-4031:

Issue Type: Task  (was: Bug)

> SQL: Optimize Date\Time\Timestamp function arguments conversion.
> 
>
> Key: IGNITE-4031
> URL: https://issues.apache.org/jira/browse/IGNITE-4031
> Project: Ignite
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 1.6
>Reporter: Andrew Mashenkov
>  Labels: performance
> Fix For: 2.0
>
>
> When sql function with Date\Time\Timestamp arguments is used, H2 internals 
> convert these objects using java.util.Calendar before pass them as arguments.
> In current H2 version we use: DateTimeUtils holds cache java.util.Calendar 
> instance in static field and synchronize every operation on it.
> If its possible, we need to have workaround to use java.util.Calendar more 
> effectively as it done for timestamp fields. See IgniteH2Indexing.wrap() 
> method.
> Startpoint: GridSQLQueryParser.FUNC_ALIAS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4031) SQL: Optimize Date\Time\Timestamp function arguments conversion.

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-4031:

Labels: performance  (was: performance sql)

> SQL: Optimize Date\Time\Timestamp function arguments conversion.
> 
>
> Key: IGNITE-4031
> URL: https://issues.apache.org/jira/browse/IGNITE-4031
> Project: Ignite
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 1.6
>Reporter: Andrew Mashenkov
>  Labels: performance
> Fix For: 2.0
>
>
> When sql function with Date\Time\Timestamp arguments is used, H2 internals 
> convert these objects using java.util.Calendar before pass them as arguments.
> In current H2 version we use: DateTimeUtils holds cache java.util.Calendar 
> instance in static field and synchronize every operation on it.
> If its possible, we need to have workaround to use java.util.Calendar more 
> effectively as it done for timestamp fields. See IgniteH2Indexing.wrap() 
> method.
> Startpoint: GridSQLQueryParser.FUNC_ALIAS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4031) SQL: Optimize Date\Time\Timestamp function arguments conversion.

2017-01-02 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-4031:

Issue Type: Bug  (was: Sub-task)
Parent: (was: IGNITE-4032)

> SQL: Optimize Date\Time\Timestamp function arguments conversion.
> 
>
> Key: IGNITE-4031
> URL: https://issues.apache.org/jira/browse/IGNITE-4031
> Project: Ignite
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6
>Reporter: Andrew Mashenkov
>  Labels: performance
> Fix For: 2.0
>
>
> When sql function with Date\Time\Timestamp arguments is used, H2 internals 
> convert these objects using java.util.Calendar before pass them as arguments.
> In current H2 version we use: DateTimeUtils holds cache java.util.Calendar 
> instance in static field and synchronize every operation on it.
> If its possible, we need to have workaround to use java.util.Calendar more 
> effectively as it done for timestamp fields. See IgniteH2Indexing.wrap() 
> method.
> Startpoint: GridSQLQueryParser.FUNC_ALIAS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3084) Investigate how Ignite can support Spark DataFrame

2017-01-02 Thread Vladimir Ozerov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794332#comment-15794332
 ] 

Vladimir Ozerov commented on IGNITE-3084:
-

Val,

Cool analysis! I would say that executing query-on-partition is very useful 
feature. Not only it will help us with Spark, but will allow us to perform 
certain useful SQL optimizations (e.g. IGNITE-4509 and IGNITE-4510). 

I am not quite sure I understand how to work with plans and strategies. Does it 
mean that we will have to analyze SQL somehow (e.g. build AST) to give correct 
hints to Spark?


> Investigate how Ignite can support Spark DataFrame
> --
>
> Key: IGNITE-3084
> URL: https://issues.apache.org/jira/browse/IGNITE-3084
> Project: Ignite
>  Issue Type: Task
>  Components: Ignite RDD
>Affects Versions: 1.5.0.final
>Reporter: Vladimir Ozerov
>Assignee: Valentin Kulichenko
>  Labels: bigdata
> Fix For: 2.0
>
>
> We see increasing demand on nice DataFrame support for our Spark integration. 
> Need to investigate how could we do that.
> Looks like we can investigate how MemSQL do that and take it as a starting 
> point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3084) Investigate how Ignite can support Spark DataFrame

2017-01-02 Thread Valentin Kulichenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794210#comment-15794210
 ] 

Valentin Kulichenko commented on IGNITE-3084:
-

I made some investigation and here is what in my view needs to be done to 
support integration between Ignite and Spark DataFrame.

# Provide implementation of {{BaseRelation}} mixed with {{PrunedFilteredScan}}. 
It should be able to execute a query based on provided filters and selected 
fields and return RDD that iterates through results. Since RDD works on per 
partition level, most likely we will need to add an ability to run SQL query on 
a particular partition.
# Provide implementation of {{Catalog}} to properly lookup Ignite relations.
# Create {{IgniteSQLContext}} that will override the catalog.

Steps above will add a new datasource to Spark. However generally, while Spark 
is executing a query, it first fetches data from the source to its own memory 
to create RDDs. Therefore this is not enough for Ignite because we already have 
data in memory. In case there is only Ignite data participating in the query, 
we want Spark to issue a query directly to Ignite.

To accomplish this we can provide our own implementation of {{Strategy}} which 
Spark uses to convert logical plan to physical plan. For any type of 
{{LogicalPlan}}, this custom strategy should be able to generate SQL query for 
Ignite, based on the whole  plan tree. If there are non-Ignite relations in the 
plan, we should fall back to native Spark strategies (return {{Nil}} as a 
physical plan).

{{IgniteSQLContext}} should append the custom strategy to collection of Spark 
strategies. Here is a good example of how custom strategy can be created and 
injected: https://gist.github.com/marmbrus/f3d121a1bc5b6d6b57b9

> Investigate how Ignite can support Spark DataFrame
> --
>
> Key: IGNITE-3084
> URL: https://issues.apache.org/jira/browse/IGNITE-3084
> Project: Ignite
>  Issue Type: Task
>  Components: Ignite RDD
>Affects Versions: 1.5.0.final
>Reporter: Vladimir Ozerov
>Assignee: Valentin Kulichenko
>  Labels: bigdata
> Fix For: 2.0
>
>
> We see increasing demand on nice DataFrame support for our Spark integration. 
> Need to investigate how could we do that.
> Looks like we can investigate how MemSQL do that and take it as a starting 
> point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (IGNITE-1072) Need to automatically support LocalDateTime class in indexing

2017-01-02 Thread Dmitry Karachentsev (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Karachentsev resolved IGNITE-1072.
-
Resolution: Duplicate

It seems duplicates https://issues.apache.org/jira/browse/IGNITE-4172

> Need to automatically support LocalDateTime class in indexing
> -
>
> Key: IGNITE-1072
> URL: https://issues.apache.org/jira/browse/IGNITE-1072
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, SQL
>Affects Versions: 1.1.4
>Reporter: Valentin Kulichenko
>Assignee: Dmitry Karachentsev
>Priority: Critical
>  Labels: Usability
> Fix For: 2.0
>
>
> Currently this query doesn't work if {{localDate}} is an instance of 
> {{LocalDateTime}}:
> {code}
> SqlFieldsQuery qry = new SqlFieldsQuery(
> "SELECT localDate from MyObject " +
> "WHERE localDate = '2013-09-12T11:00'");
> {code}
> It's not supported because {{LocalDateTime}} is JDK8 class. But probably we 
> can use reflection to solve this.
> Also need to go through all classes in {{java.time}} package and see if any 
> of them need to be supported as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-4516) Hadoop: add optional gzip compression for shuffle output

2017-01-02 Thread Vladimir Ozerov (JIRA)
Vladimir Ozerov created IGNITE-4516:
---

 Summary: Hadoop: add optional gzip compression for shuffle output
 Key: IGNITE-4516
 URL: https://issues.apache.org/jira/browse/IGNITE-4516
 Project: Ignite
  Issue Type: Task
  Components: hadoop
Reporter: Vladimir Ozerov
Assignee: Vladimir Ozerov
 Fix For: 2.0


It was already implemented, but then reverted back due to performance drop. 
However, this feature might still be very useful because it decrease amount of 
heap memory required to store intermediate output for remote nodes. 

See commit 0062362.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)