[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r335160549 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/BatchAppenderators.java ## @@ -59,13 +63,21 @@ public static Appenderator newAppenderator( TaskToolbox toolbox, DataSchema dataSchema, AppenderatorConfig appenderatorConfig, + FirehoseFactory firehoseFactory, DataSegmentPusher segmentPusher ) { +final boolean isReingest; +if (firehoseFactory instanceof IngestSegmentFirehoseFactory) { + isReingest = dataSchema.getDataSource().equals(((IngestSegmentFirehoseFactory) firehoseFactory).getDataSource()); +} else { + isReingest = false; +} Review comment: For now, I think this config is best left undocumented and for auto compaction internal usage only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r335159299 ## File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.timeline; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import org.apache.druid.indexer.partitions.PartitionsSpec; + +import java.util.Map; +import java.util.Objects; + +public class CompactionState +{ + private final PartitionsSpec partitionsSpec; + // org.apache.druid.segment.IndexSpec cannot be used here to avoid the dependency cycle Review comment: got it, thanks. maybe in next round of module merge: merge core into processing if there is no use case of anyone depending on druid-core directly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r335159299 ## File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.timeline; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import org.apache.druid.indexer.partitions.PartitionsSpec; + +import java.util.Map; +import java.util.Objects; + +public class CompactionState +{ + private final PartitionsSpec partitionsSpec; + // org.apache.druid.segment.IndexSpec cannot be used here to avoid the dependency cycle Review comment: got it, thanks. maybe in next round of module merge: merge core into processing if there is no use case of anyone depending on druid-core directly :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r335158395 ## File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.timeline; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import org.apache.druid.indexer.partitions.PartitionsSpec; + +import java.util.Map; +import java.util.Objects; + +public class CompactionState Review comment: LGTM, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334211148 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/BatchAppenderators.java ## @@ -59,13 +63,21 @@ public static Appenderator newAppenderator( TaskToolbox toolbox, DataSchema dataSchema, AppenderatorConfig appenderatorConfig, + FirehoseFactory firehoseFactory, DataSegmentPusher segmentPusher ) { +final boolean isReingest; +if (firehoseFactory instanceof IngestSegmentFirehoseFactory) { + isReingest = dataSchema.getDataSource().equals(((IngestSegmentFirehoseFactory) firehoseFactory).getDataSource()); +} else { + isReingest = false; +} Review comment: yeah , that would work and let user (in this case auto compaction code) explicitly say whether CompactionState should be saved or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334167505 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/BatchAppenderators.java ## @@ -59,13 +63,21 @@ public static Appenderator newAppenderator( TaskToolbox toolbox, DataSchema dataSchema, AppenderatorConfig appenderatorConfig, + FirehoseFactory firehoseFactory, DataSegmentPusher segmentPusher ) { +final boolean isReingest; +if (firehoseFactory instanceof IngestSegmentFirehoseFactory) { + isReingest = dataSchema.getDataSource().equals(((IngestSegmentFirehoseFactory) firehoseFactory).getDataSource()); +} else { + isReingest = false; +} Review comment: is it possible to drive this from auto compaction code in druid coordinator instead as `IngestSegmentFirehoseFactory` could be used outside of auto compaction as well. For example, as a user , knowing my data flow, I can setup re-index task to run every day for previous day's data .. sort of a manual compaction. But in that case, the `CompactionState` doesn't need to be preserved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334163948 ## File path: docs/configuration/index.md ## @@ -786,8 +786,7 @@ A description of the compaction config is: |`dataSource`|dataSource name to be compacted.|yes| |`taskPriority`|[Priority](../ingestion/tasks.html#priority) of compaction task.|no (default = 25)| |`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per compaction task. Since a time chunk must be processed in its entirety, if the segments for a particular time chunk have a total size in bytes greater than this parameter, compaction will not run for that time chunk. Because each compaction task runs with a single thread, setting this value too far above 1–2GB will result in compaction tasks taking an excessive amount of time.|no (default = 419430400)| -|`targetCompactionSizeBytes`|The target segment size, for each segment, after compaction. The actual sizes of compacted segments might be slightly larger or smaller than this value. Each compaction task may generate more than one output segment, and it will try to keep each output segment close to this configured size. This configuration cannot be used together with `maxRowsPerSegment`.|no (default = 419430400)| Review comment: we should probably add a blurb in release notes for this , just in case some people set this property and expect something to happen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334163388 ## File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.timeline; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import org.apache.druid.indexer.partitions.PartitionsSpec; + +import java.util.Map; +import java.util.Objects; + +public class CompactionState +{ + private final PartitionsSpec partitionsSpec; + // org.apache.druid.segment.IndexSpec cannot be used here to avoid the dependency cycle Review comment: couldn't understand what is the cycle ? .. do you mean `IndexSpec` can contain a `CompactionState` , so json serde would fail ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334162197 ## File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.timeline; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import org.apache.druid.indexer.partitions.PartitionsSpec; + +import java.util.Map; +import java.util.Objects; + +public class CompactionState Review comment: Can you please add javadoc for this class describing what this is, why it has the field it does ...(I know there is some discussion in proposal, but it would be very non-obvious for someone reading the code) and what guarantees it provides ... e.g. something like if a `CompactionTask` is run with parameters matching here then row distribution in segments created would be exactly same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334162197 ## File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.timeline; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import org.apache.druid.indexer.partitions.PartitionsSpec; + +import java.util.Map; +import java.util.Objects; + +public class CompactionState Review comment: Can you please add javadoc for this class describing what this is and what guarantees it provides ... e.g. something like if a `CompactionTask` is run with parameters matching here then row distribution in segments created would be exactly same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction
himanshug commented on a change in pull request #8573: Stateful auto compaction URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r328373459 ## File path: server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorSegmentCompactor.java ## @@ -178,12 +178,11 @@ private CoordinatorStats doRun( final List segmentsToCompact = iterator.next(); final String dataSourceName = segmentsToCompact.get(0).getDataSource(); - if (segmentsToCompact.size() > 1) { + if (!segmentsToCompact.isEmpty()) { Review comment: Line#179 should move in this if block. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org